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Preface 


Why did I write this book? This book started in 2011 as a set 
of lecture notes called Graph Theory: Penn State Math 485 Lecture 
Notes. I wrote a portion of those notes while staying in Key West, 
Florida. I highly recommend writing while in Key West. Actually, I 
recommend doing just about anything in Key West (or some other 
tropical place) if you can get there. 

Until 18 months ago, I was perfectly content to ignore my col- 
leagues when they told me that I should turn these notes into a book. 
Nevertheless, here we are, and I should tell you something about this 
book and why it’s different from all the other graph theory books. 
To understand that, it’s best to understand why I wrote the lecture 
notes in the first place. 

Math 485 is Penn State Math’s advanced undergraduate course on 
graph theory. It is taken by students after they have passed a course 
in discrete mathematics with a focus on proof. I’ve now taught this 
course several times. Originally, I wrote the lecture notes because, 
when I was young, I lived in terror of getting lost in the middle of a 
lecture and wasting 10 minutes doing the “absent-minded professor 
thing.” I don’t worry about that as much anymore. Key West can 
help with things like this as well. 

The first few times I taught the course, I included a module on 
linear programming and covered network flows using the formalism of 
the Karush—Kuhn—Tucker conditions rather than using a traditional 
approach. I rather liked this, and it helped introduce the students to 
connections across mathematics. I am a fan of algebraic graph theory, 
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so we covered material on graphs and matrices prior to jumping into 
linear programming. 

While I was at the United States Naval Academy, I taught SA403: 
Graph and Network Algorithms. This class needed to cover classic 
graph algorithms, so I changed up the notes to cover classic network 
flow. I also taught a class called Intermediate Linear Algebra, which 
had an emphasis on applications of linear algebra. This was the per- 
fect place to use topics from algebraic graph theory, so I wrote some 
new material on the Lagrangian matrix and spectral clustering. 

Since I usually teach undergraduates, and most (but not all) 
undergraduates like to see some applications, I also tried to cover 
as many applications as possible without disrupting the flow of a 
“theorem-proof”-type class. Letting the theory drive the applications 
suits me perfectly because I’m a very applied mathematician. 

Eighteen months ago, a very nice editor named Rochelle from 
World Scientific contacted me and asked if I’d be willing to turn my 
notes into a book. I had been asked this before and always said “no” 
because I really didn’t want the hassle. Rochelle promised faithfully 
that she would leave me alone and let me set the timeline, and, 
amazingly, that’s what happened. The result is this book, which tries 
to combine the best of all the things I liked about my lecture notes 
into something other people might be able to use. 


How do I use this book? This book can be used entirely for 
self-study or in a classroom. It is really designed for a one-semester 
course and is geared toward undergraduates who have taken a class 
on proof techniques (usually called Discrete Mathematics). However, 
clever undergraduates can probably read the material with very little 
problem even if they haven’t taken a course on proofs. The book is 
organized into four parts and written in a “theorem-proof-example” 
style. There are remarks throughout as well as chapter notes that 
try to emphasize some history or other applications. Part 1 covers 
introductory material on graphs that would be standard in any graph 
theory class. Part 2 covers algorithms, network flows, and coloring. 
Part 3 covers algebraic graph theory. Finally, Part 4 covers linear pro- 
gramming and network flow problems, including an alternate proof 
for the max-flow/min-cut theorem using the Karush—Kuhn—Tucker 
conditions. Each part emphasizes some kind of application, with ver- 
tex ranking (centrality) appearing throughout the book. The coloring 


Preface xi 


section has a nice proof of the NP-completeness of k-colorability and 
demonstrates the connection between mathematical logic and com- 
binatorics. 

Sample curriculum is shown in the following figure. 
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I’ve used each possible path through the curriculum at least once 
and they all worked for me. 


Classic route: The classic route starts with an introduction to 
graph theory (and some applications), goes through graph algorithms 
(Prim’s, Dijkstra’s, etc.), handles coloring and NP-completeness, and 
then covers algebraic graph theory. The material on linear pro- 
gramming is optional or could be used for extra credit. You can 
choose to take up coloring later, which is how I did it at the naval 
academy. 


Algebraic route: The algebraic route starts with an introduction 
to graph theory and then immediately transitions to algebraic graph 
theory. This introduces students more quickly to the connections 
between graph theory and other areas of mathematics. This also 
allows you to cover all aspects of “centrality” more or less in sequence. 
This path then circles back to cover other graph algorithms. Again, 
the material on linear programming is optional. 
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Operations research route: This route follows the algebraic route 
but then deviates to cover linear programming immediately after 
covering classic graph algorithms. Using material from both Chap- 
ters 6 and 12 together, the max-flow/min-cut theorem is covered 
through the lens of linear programming, and this is used to derive 
the Edmonds—Karp algorithm. This is how I taught Math 485 the 
first few times. 

The book is sensitive to the fact that linear programming can be 
a niche interest in some math departments. Consequently, the cover- 
age of network flows is classical in Chapter 6, which makes it easier 
for instructors who want to follow the classical route. For the more 
adventurous, combining Chapters 6 and 12 (which does require a 
little back-and-forth) does provide a unique perspective on network 
flows that is not usually available at the undergraduate level. Addi- 
tionally, there are two appendices on linear algebra and probability 
to help students who have not had sufficient coverage of these topics. 
This is relevant when dealing with algebraic graph theory. 

My favorite aspect of the book is the applications. Degree, 
betweenness, eigenvector, and page rank centrality are all covered, 
as are spectral clustering and the graph Laplacian. The graph algo- 
rithms are easy to understand with applications to routing, but I also 
discuss arbitrage discovery in currency exchanges (which was a ques- 
tion I was asked during an Amazon interview). Computing whether 
sports teams can make the playoffs is discussed as an application of 
network flows, and exam scheduling illustrates the use of coloring. 
For those who cover the linear programming material, the assign- 
ment problem is discussed along with profit maximization in simple 
companies. 


What isn’t in this book? Since this is geared toward a one- 
semester class, there are several omissions. For those who wish to 
try the linear programming material, the simplex algorithm is not 
presented. Instead, I show how to solve linear programs with a com- 
puter. (However, I do cover optimality conditions and duality.) Pla- 
nar graphs are entirely ignored in the main text, though they are 
briefly discussed (along with the four color theorem) in chapter notes. 
If you’ve found my lecture notes online, you’ll notice I have expunged 
coverage of random graphs in favor of the graph Laplacian. This was 
a difficult decision, but (i) the chapter did not fit into the flow and 
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(ii) it was too similar to the coverage by Gross and Yellen in their sec- 
ond edition of Graph Theory and its Applications. While I seriously 
considered including coverage of edge and vertex spaces (including 
cut and cycle spaces), including sufficient background on direct sums 
in linear algebra would have made the book too long to pass as a 
one-semester treatment. Also, I never managed to cover those in a 
class, so I don’t know how students would react. 
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Chapter 1 


Introduction to Graph Theory 


Remark 1.1 (Chapter goals). The goal of this chapter is to intro- 
duce the basic concepts of graph theory and its vocabulary. We also 
discuss a little of its history. 


1.1 Graphs, Multigraphs, and Simple Graphs 


Definition 1.2 (Graph). A graph is a tuple G = (V, E) where V 
is a (finite) set of vertices and F is a finite collection of edges. The 
set E contains elements from the union of the one- and two-element 


subsets of V. That is, each edge is either a one- or two-element subset 
of V. 


Remark 1.3. It is generally easiest to visualize a graph as a col- 
lection of shapes for its vertices and lines (or curves) connecting the 
vertices for its edges. Though any shape can be used for the vertices, 
in mathematical graph theory, dots or circles are the most common. 


Example 1.4. Consider the set of vertices V = {1,2,3,4} and the 
set of edges 


E = {{1, 2}, {2, 3}, {3, 4}, {4, 1}. 


Then, the graph G = (V,£) has four vertices and four edges. See 
Fig. 1.1 for the visual representation of G. 
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Fig. 1.1 It is easier for explanations to represent a graph by a diagram in which 
vertices are represented by points (or squares, circles, triangles, etc.) and edges 
are represented by lines connecting vertices. 


Self-Loop 


Fig. 1.2 A self-loop is an edge in a graph G that contains exactly one vertex. 
That is, an edge that is a one-element subset of the vertex set. Self-loops are 
illustrated by loops at the vertex in question. 


Definition 1.5 (Self-loop). If G = (V,E) is a graph and v € V 
and e = {v}, then edge e is called a self-loop. That is, any edge that 
is a single-element subset of V is called a self-loop. 


Example 1.6. If we replace the edge set in Example 1.4 with 


B= {{1, 2}, {2, 3}, {3, 4}, {4, Ly {1} ts 


then the visual representation of the graph includes a self-loop that 
starts and ends at Vertex 1. This is illustrated in Fig. 1.2. 


Definition 1.7 (Vertex adjacency). Let G = (V, FE) be a graph. 
Two vertices v; and v2 are said to be adjacent if there exists an edge 
e € E so that e = {v1, v2}. A vertex v is self-adjacent if e = {v} is 
an element of E. 


Definition 1.8 (Edge adjacency). Let G = (V,E) be a graph. 
Two edges e; and e€2 are said to be adjacent if there exists a vertex 
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v so that v is an element of both e; and eg (as sets). An edge e is 
said to be adjacent to a vertex v if v is an element of € as a set. 


Definition 1.9 (Neighborhood). Let G = (V, E) bea graph, and 
let v € V. The neighbors of v are the set of vertices that are adjacent 
to v. Formally, 


N(v) = {we V: de € E(e = {u,v} or u=v and e = {v})}. (1.1) 


In some texts, N(v) is called the open neighborhood of v, while N[v] = 
N(v) U {v} is called the closed neighborhood of v. This notation is 
somewhat rare in practice. When v is an element of more than one 
graph, we write Nc(v) as the neighborhood of v in graph G. 


Remark 1.10. Equation (1.1) is read: 


N(v) is the set of vertices u in (the set) V such that there exists 
an edge e in (the set) F so that e = {u,v} or u= v ande = {v}. 


The logical expression 4x (R(a)) is always read in this way; that 
is, there exists x so that some statement R(x) holds. Similarly, the 
logical expression Vy (R(y)) is read: 


For all y the statement R(y) holds. 


Admittedly, this sort of thing is very pedantic, but logical nota- 
tion can help immensely in simplifying complex mathematical 
expressions. 


Remark 1.11. The difference between the open and closed neigh- 
borhoods of a vertex can get a bit odd when you have a graph with 
self-loops. Since this is a highly specialized case, usually the author 
(of the paper, book, etc.) will specify a behavior. 


Example 1.12. In the graph in Example 1.12, the neighborhood of 
Vertex 1 consists of Vertices 2 and 4 and Vertex 1 because Vertex 1 
is adjacent to itself. The neighborhood of Vertex 1 now consists of 
vertices 1, 2, and 4. Note that the self-loop forces Vertex 1 into its 
own neighborhood. 


Definition 1.13 (Degree). Let G = (V,E) be a graph, and let 
v € V. The degree of v, written deg(v), is the number of non-self- 
loop edges adjacent to v plus two times the number of self-loops 
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defined at v. More formally, 


deg(v) = |{e € FE: du € V(e = {u, v})}| +2|{e e FH: e = {v}}]. 
Here, if S is a set, then |S| is the cardinality of that set. 


Remark 1.14. Note that each vertex in the graph in Fig. 1.1 has a 
degree of 2. 


Example 1.15. Consider the graph shown in Example 1.6. The 
degree of Vertex 1 is 4. We obtain this by counting the number of 
non-self-loop edges adjacent to Vertex 1 (there are 2) and adding 
two times the number of self-loops at Vertex 1 (there is 1) to obtain 
24+2x1l=4. 


Remark 1.16. Let G = (V,E) be a graph. There are two degree 
values that are of interest in graph theory: The largest and smallest 
vertex degrees are usually denoted by A(G) and 6(G), respectively. 
That is, 


A(G) = max deg(v) and (1.2) 
iG) = min deg(v). (13) 


Definition 1.17 (Multigraph). A graph G = (V,E) is a multi- 
graph if there are two edges e; and e2 in F so that e; and eg are 
equal as sets. That is, there are two vertices vy and v2 in V so that 
CY =) = {v1, v2}. 


Remark 1.18. Note in the definition of graph (Definition 1.2), we 
were very careful to specify that E is a collection of one- and two- 
element subsets of V rather than to say that EF was a set. This allows 
us to have duplicate edges in the edge set and thus to define multi- 
graphs. In computer science, a set that may have duplicate entries is 
sometimes called a multiset. A multigraph is a graph in which FE is 
a multiset. 


Derivation 1.19 (K6nigsburg bridge problem). Graph theory 
began with Leonhard Euler with his study of the bridges of 
K6nigsburg problem. Here’s how it started: The city of Konigsburg 
exists as a collection of islands connected by bridges. The problem 


Introduction to Graph Theory 7 


Euler wanted to analyze was: Is it possible to go from island to island, 
traversing each bridge only once? 

Following Euler, we construct a graph to analyze the bridges of 
Konigsburg problem. Assume that we treat each island as a vertex 
and each bridge as an edge. The resulting multigraph is illustrated 
in Fig. 1.3. The edge collection is 


E = {{A, B},{A, B}, {A,C}, {A, C}, {A, D}, {B, D}, {C, D}}. 


This multigraph occurs because there are two bridges connecting 
island A with island B and two bridges connecting island A with 
island C. If two vertices are connected by two (or more) edges, then 
the edges are simply represented as parallel lines (or arcs) connecting 
the vertices. 

Note that this representation dramatically simplifies the analysis 
of the problem in so far as we can now focus only on the structural 
properties of this graph. It’s easy to see (from Fig. 1.3) that each 
vertex has an odd degree. More importantly, since we are trying 
to traverse islands without ever recrossing the same bridge (edge), 
when we enter an island (say C), we use one of the three edges. 
Unless this is our final destination, we must use another edge to leave 
C. Additionally, assuming we have not crossed all the bridges yet, 
we know that we must leave C. That means that the third edge that 


JRE KS SSS = SSS SSR Sa | Island(s) 
/ ~ IN 


t - tay 


Fig. 1.3 Representing each island as a dot and each bridge as a line or curve 
connecting the dots simplifies the visual representation of the seven K6énigsburg 
bridges. 
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touches C’ must be used to return to Ca final time. Alternatively, we 
could start at Island C and then return once and never come back. 
Put simply, our trip around the bridges of K6nigsburg had better 
start or end at Island C. But Islands (vertices) B and D also have 
this property. We can’t start and end our travels over the bridges on 
Islands C, B, and D simultaneously; therefore, no such walk around 
the islands in which we cross each bridge precisely once is possible. 


Definition 1.20 (Simple graph). A graph G = (V, E£) is a simple 
graph if G has no edges that are self-loops and if EF is a subset of 
two-element subsets of V, i.e., G is not a multigraph. 


Remark 1.21. Most of graph theory is not concerned with graphs 
containing either self-loops or multigraphs. Thus, we assume that 
every graph we discuss from this point on is a simple graph, and we 
use the term graph to mean simple graph. When a particular result 
holds in a more general setting, we state it explicitly. 


1.2 Directed Graphs 


Definition 1.22 (Directed graph). A directed graph (digraph) is 
a tuple G = (V,E) where V is a (finite) set of vertices and E is a 
collection of elements contained in V x V. That is, & is a collection 
of ordered pairs of vertices. The edges in FE are called directed edges 
to distinguish them from those edges in Definition 1.2 


Definition 1.23 (Source/Destination). Let G = (V,E) be a 
directed graph. The source (or tail) of the (directed) edge e = (v1, v2) 
is v1, while the destination (or sink or head) of the edge is v2. 


Remark 1.24. A directed graph (digraph) differs from a graph only 
insofar as we replace the concept of an edge as a set with the idea that 
an edge is an ordered pair in which the ordering gives some notion 
of the direction of a flow. In the context of a digraph, a self-loop is 
an ordered pair with form (v,v). We can define a multi-digraph if 
we allow the set F’ to be a true collection (rather than a set) that 
contains multiple copies of an ordered pair. 


Remark 1.25. It is worth noting that the ordered pair (v1, v2) is 
distinct from the pair (v2,v1). Thus, if a digraph G = (V,£) has 
both (v1, v2) and (v2, v1) in its edge set, it is not a multi-digraph. 
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Fig. 1.4 (a) A directed graph and (b) a directed graph with a self-loop. In a 
directed graph, edges are directed; that is, they are ordered pairs of elements 
drawn from the vertex set. The ordering of the pair gives the direction of the 
edge. 


Example 1.26. We can modify the figures in Example 1.4 to make 
them directed. Suppose we have the directed graph with vertex set 
V = {1,2,3,4} and edge set 


E = {(1, 2), (2,3), (3,4), (4, Df. 


This digraph is visualized in Fig. 1.4(a). In drawing a digraph, 
we simply append arrowheads to the destination associated with a 
directed edge. 

We can likewise modify our self-loop example to make it. directed. 
In this case, our edge set becomes 


E = {(1, 2), (2,3), (3, 4), (4,1), (1, Df. 
This is shown in Fig. 1.4(b). 


Definition 1.27 (In-degree, out-degree). Let G = (V,FE) bea 
digraph. The in-degree of a vertex v in G is the total number of edges 
in EF with destination v. The out-degree of v is the total number of 
edges in F with source v. We denote the in-degree of v by deg;,,(v) 
and the out-degree by deg... (v). 


Definition 1.28 (Underlying graph). If G = (V,EF), is a 
digraph, then the underlying graph of G is a (multi)graph (with 
self-loops) that results when each directed edge (v1, v2) is replaced 
by the set {v1, v2}, thus making the edge nondirectional. Naturally, 
if the directed edge is a directed self-loop (v,v), then it is replaced 
by the singleton set {v}. 
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Remark 1.29. Notions like edge and vertex adjacency and neigh- 
borhood can be extended to digraphs by simply defining them with 
respect to the underlying graph of a digraph. Thus, the neighborhood 
of a vertex v in a digraph G is N(v) computed in the underlying 
graph. 


Remark 1.30. Whether the underlying graph of a digraph is a 
multigraph or not usually has no bearing on relevant properties. In 
general, an author will state whether two directed edges (v1, v2) and 
(vg,v1) are combined into a single set {v1, v2} or two sets in a mul- 
tiset. As a rule of thumb, multi-digraphs will have underlying multi- 
graphs, while digraphs generally have underlying graphs that are not 
multigraphs. 


Remark 1.31. It is possible to mix (undirected) edges and directed 
edges together into a very general definition of a graph with both 
undirected and directed edges. This usually only occurs in specific 
models, and we will not consider such graphs. For the remainder of 
this book, unless otherwise stated: 


(1) When we say graph, we mean a simple graph, as in Remark 1.21. 

(2) When we say digraph, we mean a directed graph G = (V, EF), in 
which every edge is a directed edge and the component F is a set. 
For practical purposes (as in when we discuss Markov chains), 
we allow self-loops in digraphs. 


1.3. Chapter Notes 


Leonhard Euler (1707-1783), the father of graph theory, was born in 
Basel, Switzerland, and studied under the Bernoullis. He is known as 
one of the greatest mathematicians of all time. He is one of the most 
prolific mathematical writers in history, with over 850 papers, some 
published after his death [1]. Euler conceived of the basic elements 
of graph theory after being presented with the bridges of Konigsburg 
problem, as detailed in this letter to Giovanni Jacopo Marinoni [2]: 


A problem was posed to me about an island in the city of 
Konigsberg, surrounded by a river spanned by seven bridges, 
and I was asked whether someone could traverse the separate 
bridges in a connected walk in such a way that each bridge 
is crossed only once. I was informed that hitherto no-one had 
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demonstrated the possibility of doing this, or shown that it is 
impossible. This question is so banal, but seemed to me worthy 
of attention in that geometry, nor algebra, nor even the art of 
counting was sufficient to solve it. In view of this, it occurred 
to me to wonder whether it belonged to the geometry of position 
[geometriam Situs], which Leibniz had once so much longed for. 
And so, after some deliberation, I obtained a simple, yet com- 
pletely established, rule with whose help one can immediately 
decide for all examples of this kind, with any number of bridges 
in any arrangement, whether such a round trip is possible, 
or not... 


Leonard Euler 
Letter to Giovanni Jacopo Marinoni 
1736 


1.4 Exercises 


Exercise 1.1 

In an online social network, Alice is friends with Bob and Charlie. 
Charlie is friends with David and Edward. Edward is friends with 
Bob. Draw a graph to represent this social network. 


Exercise 1.2 
Find the neighborhoods and degrees for each vertex in the graph 
shown in Fig. 1.5. 


Fig. 1.5 Graph for Exercise 1.2. 
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Exercise 1.3 

Since Euler’s work, two of the seven bridges in K6nigsburg have 
been destroyed (during World War II). Another two were replaced 
by major highways, but they are still (for all intents and purposes) 
bridges. The remaining three are still intact (see Fig. 1.6). Deter- 
mine whether it is possible to visit the bridges traversing each bridge 
exactly once. If so, find such a sequence of edges. 


©) 


Fig. 1.6 During World War II, two of the seven original Konigsburg bridges were 
destroyed. Later, two more were made into modern highways (but they are still 
bridges). Is it now possible to go from island to island, traversing each bridge only 
once? 


Exercise 1.4 

Consider the new bridges of Konigsburg problem from Exercise 1.3. 
Is the graph representation of this problem a simple graph? Could a 
self-loop exist in a graph derived from a bridges-of-K6nigsburg-type 
problem? If so, what would it mean? If not, why? 


Exercise 1.5 
Prove that for simple graphs, the degree of a vertex is simply the 
cardinality of its (open) neighborhood. 


Exercise 1.6 

Suppose in the new bridges of K6nigsburg problem (from 
Exercise 1.3), some of the bridges are to become one way. Find a 
way of replacing the edges in the graph you obtained in solving 
Exercise 1.3 with directed edges so that the graph becomes a digraph, 
but it is still possible to tour all the islands without crossing the same 
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bridge twice. Is it possible to directionalize the edges so that a tour 
in which each bridge is crossed once is not possible, but it is still 
possible to enter and exit each island? If so, do it. If not, prove that 
it is not possible. [Hint: In this case, enumeration is not that hard 
and is the most straightforward. You can use symmetry to shorten 
your argument substantially. | 
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Chapter 2 


Degree Sequences and Subgraphs 


Remark 2.1 (Chapter goals). In this chapter, we introduce the 
idea of a degree sequence. We then discuss graph families with spe- 
cial degree sequences, prove the Havel—Hakimi theorem, and discuss 
subgraphs. We conclude by discussing cliques, independent sets, and 
vertex covers. 


2.1 Degree Sequences 


Definition 2.2 (Degree sequence). Let G = (V,£) be a graph, 
with |V| =n. The degree sequence of G is a tuple d € Z” composed 
of the degrees of the vertices in V arranged in decreasing order. 


Example 2.3. Consider the graph in Fig. 2.1. The degrees for the 
vertices of this graph are: 


(1) Vi = 4, 
(2) v2 = ae 
(3) v3 = 2, 
(4) v4 = 2, and 
(5) V5 = 1. 


This leads to the degree sequence d = (4,3, 2, 2,1). 


Definition 2.4 (Empty and trivial graphs). A graph G = 
(V,£) in which V = 9 is called an empty graph (or null graph). 
A graph in which V = {v} and FE = @ is called a trivial graph. 


15 
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Fig. 2.1 The graph above has a degree sequence d = (4, 3, 2,2, 1). These are the 
degrees of the vertices in the graph arranged in increasing order. 


Remark 2.5. An empty graph has an empty degree sequence. 
A trivial graph has a degree sequence composed of all zeros. 


Definition 2.6 (Isolated vertex). Let G = (V, E) bea graph, and 
let v € V. If deg(v) = 0, then v is said to be isolated. 


Remark 2.7. Note that Definition 2.6 applies only when G is a 
simple graph. If G is a general graph (one with self-loops), then v 
is still isolated even when {v} € E; that is, there is a self-loop at 
vertex uv and no other edges are adjacent to v. In this case, however, 
deg(v) = 2. 


Assumption 1 (Pigeonhole principle). Suppose that items may 
be classified according to m possible types, and we are given n > m 
items. Then, there are at least two items with the same type. 


Remark 2.8. The pigeonhole principle was originally formulated by 
thinking of placing m+ 1 pigeons into m pigeon holes. Clearly, to 
place all the pigeons in the holes, one hole must have two pigeons in 
it. The holes are the types (each whole is a different type), and the 
pigeons are the objects. Another good example deals with gloves: 
There are two types of gloves (left handed and right handed). If 
I hand you three gloves (the objects), then you either have two left- 
handed gloves or two right-handed gloves. 


Theorem 2.9. Let G = (V,E) be a nonempty, nontrivial graph. 
Then, G has at least one pair of vertices with equal degree. 
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Proof. This proof uses the pigeonhole principle and is illustrated 
by the graph in Fig. 2.1, where deg(v3) = deg(v4). The types will be 
the possible vertex degree values, and the objects will be the vertices. 

Suppose |V| = n. Each vertex could have a degree between 0 and 
n—1 (for a total of n possible degrees), but if the graph has a vertex 
of degree 0, then it cannot have a vertex of degree n — 1. Therefore, 
there are only at most nm — 1 possible degree values, depending on 
whether the graph has an isolated vertex or a vertex with degree 
n—1 (if it has neither, there are even fewer than n—1 possible degree 
values). Thus, by the pigeonhole principle, at least two vertices must 
have the same degree. 


Theorem 2.10. Let G = (V,E) be a (general) graph, then 


2|E| = 5 ~ deg(v). (2) 


vEV 


Proof. Consider two vertices v; and v2 in V. If e = {v1, v2}, then 
a +1 is contributed to >.<) deg(v) for both v; and v2. Thus, every 
non-self-loop edge contributes +2 to the vertex degree sum. On the 
other hand, if e = {v,} is a self-loop, then this edge contributes +2 
to the degree of v;. Therefore, each edge contributes exactly +2 to 
the vertex degree sum. Equation (2.1) follows immediately. 


Corollary 2.11. Let G = (V,E). Then, there are an even number 
of vertices in V with odd degree. 


Theorem 2.12. Let G = (V,F) be a digraph. Then, the following 
holds: 


JE] = $7 degin(v) = Y 7 degous (0): (2.2) 


vEV vEV 


Definition 2.13 (Graphic sequence). Let d = (d,...,dn) bea 
tuple in Z”, with dy > dy > --- > dy. Then, d is graphic if there 
exists a graph G with degree sequence d. 


Remark 2.14. See the chapter notes for a discussion on the appli- 
cations of graphs with specific kinds of degree distributions. 
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Corollary 2.15. If d is graphic, then the sum of its elements is 
even. 


Lemma 2.16. Let d = (d1,...,dn) be a graphic degree sequence. 
Then, there exists a graph G = (V, E) with degree sequence d so that 
OV HAD, anus Vy ty THEM 


(1) desta) = a fort —1,....n) and 
(2) v1 ts adjacent to vertices v2,...,Ud,41- 


Proof. The fact that d is graphic means there is at least one graph 
whose degree sequence is equal to d. From among all those graphs, 
chose G = (V, E) to maximize 


= |N(ai) Pi tegces5 tac tl (2.3) 


Recall that N(v,) is the neighborhood of v;. Thus, maximizing 
Eq. (2.3) implies that we are attempting to make sure that as many 
vertices in the set {vo,...,Va,+1} are adjacent to v1 as possible. 

If r = d,, then the theorem is proved since v, is adjacent to 
U2,+++,;Ud,+1- Now, proceed by contradiction and assume r < dj. 
We know the following things: 


(1) Since deg(v1) = di, there must be a vertex vy with t > dj + 1 so 
that v; is adjacent to v1. 

(2) Moreover, there is a vertex v, with 2 < s < d, +1 that is not 
adjacent to v1. 

(3) By the ordering of V, deg(v,) > deg(v); that is, ds > dt. 

(4) Therefore, there is some vertex vz € V so that vs is adjacent to 
vz but vz is not because v;, is adjacent to vj and vy, is not, and 
the degree of v, is at least as large as the degree of v4. 


Let us create a new graph G’ = (V, E’). The edge set E’ is con- 
structed from E by: 


(1) removing edge {v1, v:}, 
(2) removing edge {vs, vz}, 
(3) adding edge {v1, us}, and 
(4) adding edge {v;, ug}. 
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Fig. 2.2 We construct a new graph G’ from G that has a larger value r (see 
Eq. (2.3)) than our original graph G did. This contradicts our assumption that 
G was chosen to maximize r. 


This is illustrated in Fig. 2.2. In this construction, the degrees of vj, 
Uz, Us, and vz, are preserved. However, it is clear that in G’, 


r= |N@ (v1) M {v2, ie , Ud, +131; 


and we have r’ > r. This contradicts our initial choice of G and 
proves the theorem. 


Theorem 2.17 (Havel-Hakimi theorem). A degree sequence 
d = (d,,...,dn) is graphic if and only if the sequence (dz — 
loceegddiap— 1d gions, d,_) 18 Graphic. 


Proof. (=) Suppose that d = (d,,...,d,) is graphic. Then, by 
Lemma 2.16, there is a graph G with degree sequence d so that: 
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(1) deste) =a; for ¢—1,4...0) and 
(2) vy is adjacent to vertices v2,...,Ud,+41- 


If we remove vertex vj and all edges containing v, from this graph 
G to obtain G’, then in G’, for all i € {2,...d, +1}, the degree of v; 
is dj; — 1, while for j € {dj + 2,...,n}, the degree of v; is dj because 


v1 is not adjacent to vg,42,..-,;Un by choice of G. Thus, G’ has a 
degree sequence of (dz —1,...,dg,41 — 1, dg,+42,---,dn), and thus it 
is graphic. 


(<=) Now, suppose that (do — 1,...,dg,41 — 1,d¢,42,..-,dn) is 
graphic. Then, there is some graph G that has this as its degree 
sequence. We can construct a new graph G’ from G by adding a 
vertex v; to G and creating an edge from v; to each vertex from 
v2 to va,41- It is clear that the degree of v1 is dj, while the degrees 
of all other vertices v; must be d;, and thus, d = (dj,...,dy) is 
graphic because it is the degree sequence of G’. This completes the 
proof. 


Remark 2.18. Naturally, one might have to rearrange the ordering 
of the degree sequence (dz —1,...,da,41—1,da,+2,..-,dn) to ensure 
it is in descending order. 


Example 2.19. Consider the degree sequence d = (5,5, 4,3, 2,1). 
One might ask if this degree sequence is graphic. Note that 5+ 5+ 
4+3+2+1 = 20, so, at least, the necessary condition that the 
degree sequence sum to an even number is satisfied. In this d, we 
have dy 5, dy 5, dz A, d4 ae ds 2, and de =. 

Applying the Havel-Hakimi theorem, we know that this degree 
sequence is graphic if and only if d’ = (4,3,2,1,0) is graphic. Note 
that this is (dz — 1,d3 — 1,d4 — 1,d5 — 1,dg — 1) since dj +1 = 
5 +1 = 6. Now, if d’ where graphic, then we would have a graph 
with five vertices, one of which has a degree of 4 and another that has 
a degree of 0, and no two vertices have the same degree. Applying 
either Theorem 2.9 (or its proof), we see that this is not possible. 
Thus, d’ is not graphic, and so, d is not graphic. 


Remark 2.20. There are several proofs of the next theorem, but 
they are outside the scope of this text. See Ref. [3] for a short induc- 
tive proof. 
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Theorem 2.21 (Erd6s—Gallai theorem). A degree sequence d = 


(d1,...,dn) is graphic if and only if its sum is even and for all 
1<k<n-l1, 
k n-1 
Sidi <k(k+1)+ S> min{k +1, dj}. (2.4) 
i=1 i=k+1 


2.2 Some Types of Graphs from Degree Sequences 


Definition 2.22 (Complete graph). Let G = (V,E) be a graph, 
with |V| = n with n > 1. If the degree sequence of G is (n — 1, 
n—1,...,n—1), then G is called a complete graph on n vertices and 
is denoted by K,,. In a complete graph on n vertices, each vertex is 
connected to every other vertex by an edge. 


Lemma 2.23. Let K,, = (V, E) be the complete graph on n vertices. 
Then, 


Corollary 2.24. Let G = (V, EF) be a graph, and let |V| =n. Then, 


n 
<|E| < ; 


Definition 2.25 (Regular graph). Let G = (V,E) be a graph 
with |V| = n. If the degree sequence of G is (k,k,...,k) with 
k <n-—1, then G is called a k-regular graph on n vertices. 


Example 2.26. We illustrate one complete graph and two (non- 
complete) regular graphs in Fig. 2.3. Obviously, every complete graph 
is a regular graph. Every Platonic solid is also a regular graph, but 
not every regular graph is a Platonic solid. In Fig. 2.3(c), we show a 
flattened dodecahedron, one of the five platonic solids from classical 
geometry. The Petersen graph (Fig. 2.3(b)) is a 3-regular graph that 
is used in many graph-theoretic examples. 
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(a) 


Fig. 2.3. (a) The complete graph K4, (b) the “Petersen graph,” and (c) the 
dodecahedron. All Platonic solids are three-dimensional representations of regular 
graphs, but not all regular graphs are Platonic solids. These figures were generated 
with Maple. 


2.3. Subgraphs 


Definition 2.27 (Subgraph). Let G = (V,E). A graph H = 
(V’, E’) is a subgraph of G if V’ C V and E’ C E. The subgraph 
H is proper if V'C V or BE’ CE. 


Example 2.28. We illustrate the notion of a subgraph in Fig. 2.4. 
The Petersen graph is shown on the left. A subgraph containing 
vertices 1-5 is highlighted in the middle. On the right, we show a 
subgraph of the Petersen graph containing vertices 6-12 on its own. 


Definition 2.29 (Spanning subgraph). Let G = (V,E) be a 
graph and H = (V’,E’) be a subgraph of G. The subgraph H is 
a spanning subgraph of Gif V'’=V. 


Definition 2.30 (Edge-induced subgraph). Let G = (V,E) be 
a graph. If E’ C HE, the subgraph of G induced by E’ is the graph 
H =(V’,E’), where v € V’ if and only if v appears in an edge in EL. 


Example 2.31. Using the Petersen graph, we illustrate a sub- 
graph induced by a vertex subset and a spanning subgraph. In 
Fig. 2.4(b), we illustrate the subgraph induced by the vertex sub- 
set V’ = {1,2,3,4,5} (highlighted). In Fig. 2.5, we have a spanning 
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Fig. 2.4 (a) The Petersen graph is shown with (b) a subgraph highlighted and 
(c) that subgraph displayed on its own. A subgraph of a graph is another graph 
whose vertices and edges are subcollections of those of the original graph. 


Fig. 2.5 The spanning subgraph is induced by the edge subset E’ = {{1, 3}, 
{1,4}, {1,6}, {2,4}, {2,5}, {2,7}, {3,5}, {3,8}, {4,9}, {5, 10}}. 


subgraph induced by the edge subset 


B= {{1, 3}, {1,4}, als 6}, {2, 4}, {2, 5}, 
{2,7 7343) 0,13,8h44, 9) 415, 10t}. 


Definition 2.32 (Vertex-induced subgraph). Let G = (V,£) 
be a graph. If V’ C E, the subgraph of G induced by V’ is the 
graph H = (V’, E’), where {v1, v2} € E’ if and only if v, and v2 are 
both in V’. 
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Remark 2.33. For directed graphs, all subgraph definitions are 
modified in the obvious way. Edges become directed as one would 
expect. 


2.4 Cliques, Independent Sets, Complements, 
and Covers 


Definition 2.34 (Clique). Let G = (V, E) be a graph. A clique is 
a set S C V of vertices so that: 


(1) the subgraph induced by S is a complete graph (or in general 
graphs, every pair of vertices in S is connected by at least one 
edge in F); and 

(2) if S’ > S, there is at least one pair of vertices in S” that are not 
connected by an edge in E. 


Remark 2.35. There is sometimes a little contention about the def- 
inition of clique. Some people define it to be any set of vertices of a 
graph that induces a complete graph. That is, they drop the second 
property of Definition 2.34. In this case, a clique S satisfying the sec- 
ond property of Definition 2.34 is called a maximal clique because no 
other vertex can be added to the set while keeping the set a clique. 
A mazimum cardinality clique of a graph is simply a clique that is 
largest in size (number of elements) among all possible cliques of the 
graph. 


Definition 2.36 (Independent set). Let G = (V, F) be a graph. 
An independent set of G is a set I C V so that no pair of vertices in 
I is joined by an edge in EF. A set I C V is a maximal independent 
set if J is independent and if there is no other set J D> J such that J 
is also independent. 


Example 2.37. The easiest way to think of cliques is as subgraphs 
that are K,, but so that no larger set of vertices induces a larger com- 
plete graph. Maximal independent sets are the opposite of cliques. 
The graph illustrated in Fig. 2.6(a) has three cliques. An independent 
set is illustrated in Fig. 2.6(b). 
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(a) (b) 


Fig. 2.6 (a) A clique is a set of vertices in a graph that induces a complete graph 
as a subgraph and so that no larger set of vertices has this property. The graph 
in this figure has three cliques. (b) An independent set of vertices that are not 
adjacent is also shown. 


Definition 2.38 (Clique number). Let G = (V,E) be a graph. 
The clique number of G, written w(G), is the size (number of vertices) 
of the largest clique in G. 


Definition 2.39 (Independence number). The independence 
number of a graph G = (V, FE), written a(G), is the size of the largest 
independent set of G. 


Definition 2.40 (Graph complement). Let G = (V,E) be a 
graph. The graph complement of G is a graph H = (V, E’) so that 


e = {v1,v2} € E’ => {v1, v2} ¢ E. 


Example 2.41. In Fig. 2.7, the graph from Fig. 2.6 is illustrated (in 
a different spatial configuration) with its cliques. The complement 
of the graph is also illustrated. Note that in the complement, every 
clique is now an independent set. 


Theorem 2.42. Let G = (V,E) be a graph, and let H = (V,E’) be 
its complement. A set S is a clique in G if and only if S is a maximal 
independent set in H. 


Remark 2.43. We may compare the graph complement to the rel- 
ative complement of a subgraph. 
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Fig. 2.7. A graph and its complement with cliques in one illustrated and inde- 
pendent sets in the other illustrated. 


Fig. 2.8 A vertex cover is a set of vertices with the property that every edge has 
at least one endpoint inside the covering set. 


Definition 2.44 (Vertex cover). Let G = (V,E) be a graph. 
A vertex cover is a set of vertices S C V so that for all e € E, 
at least one element of e is in S; i.e., every edge in F is adjacent to 
at least one vertex in S. 


Example 2.45. A vertex cover is illustrated in Fig. 2.8. 


Theorem 2.46. A set I is an independent set in a graph G = (V, E) 
if and only if the set V \ I is a covering in G. 


Proof. (=) Suppose that J is an independent set and we choose 
e = {v,v'} € E. If ve I, then clearly v' € V \ I. The same is true of 
uv’. It is possible that neither v nor v’ is in J, but this does not affect 
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the fact that V \ I must be a cover since for every edge e € F, at 
least one element of e is in V \ J. 

(<) Now, suppose that V \J is a vertex covering. Choose any two 
vertices v and v’ in I. The fact that V \ J is a vertex covering implies 
that {v,v’} cannot be an edge in FE because it does not contain at 
least one element from V \ J, contradicting our assumption about 
V \ I. Thus, I is an independent set since no two vertices in I are 
connected by an edge in E. This completes the proof. 


Remark 2.47. Theorem 2.46 shows that the problem of identifying 
a largest independent set is essentially identical to the problem of 
identifying a minimum (size) vertex covering. To see one example of 
the utility of a vertex covering, imagine a graph structure defined 
by the hallways of a building. Vertices are the intersection of two 
hallways. Finding the minimal vertex covering asks the question: 
What is the minimum number of guards or cameras that must be 
used to monitor each hallway? 


2.5 Chapter Notes 


Paul Erdds (1913-1996), also known as the “Magician from 
Budapest” [4], is the most prolific mathematician in history (after 
Euler), publishing over 1,400 papers [5]. Erdés was known for his 
unusual lifestyle and deep love of mathematics. He is an academic 
brother of John von Neumann, one of the most influential mathe- 
maticians of the 20th century, as well as Pal Turan, with whom he 
collaborated. 

Tibor Gallai was also a Hungarian mathematician who worked 
closely with Paul Erdés. Gallai worked in graph theory and is also 
known for the Edmonds-Gallai decomposition [6], which is closely 
related to matching in graphs—which we will study later. His advisor 
was Konig, whose results on matchings and coverings we will be 
studying later. 

Vaclav Havel published a version of the Havel-Hakimi theorem 
(algorithm) in 1955 [7]. Seifollah Louis Hakimi published his own 
version in 1962 [8]. Hakimi was an academic descendant of Vannevar 
Bush, one of the most influential engineers of the 20th century in the 
United States. 
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The investigation of graphs with specific degree sequences became 
popular as a result of the work in network science [9]. Scale-free 
graphs (or networks) have a degree sequence that follows a power- 
law distribution; that is, the number of vertices with a degree of d, 
denoted by n(d), behaves according to the law n(d) « d~7, where 
y is called the scaling constant. To find out why all this investiga- 
tion started, see Barabdsi and Albert’s original work [10], as well 
as commentaries, e.g., Ref. [11]. A more mathematical (and graph- 
theoretic) perspective on the properties of graphs with specific degree 
distributions can be found in Refs. [12-15]. 

The graph covering problem is closely related to the dominating 
set problem. A dominating set in a graph G = (V,E£) is a set of 
vertices D C V so that every vertex is either in V or adjacent to a 
vertex in D. An example is the classical art gallery problem [16], in 
which the objective is to position a minimum number of guards to 
guard an art gallery. The art gallery problem can be reduced to a 
dominating set problem for a specially designed graph. 


2.6 Exercises 


Exercise 2.1 
Prove Corollary 2.11. 


Exercise 2.2 
Prove Theorem 2.12. 


Exercise 2.3 
Prove Corollary 2.15. 


Exercise 2.4 
Decide whether the following sequence is graphic: d = (6,4,3,3, 
2,1,1). 


Exercise 2.5 
Develop a (recursive) algorithm based on Theorem 2.17 to determine 
whether a sequence is graphic. 


Exercise 2.6 
Prove Lemma 2.23 and Corollary 2.24. [Hint: Use Eq. (2.1).] 
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Exercise 2.7 
Assume that the clique number of a graph G is 4. What is the inde- 
pendence number of the complement of G? 


Exercise 2.8 
Consider the following Petersen graph. 


Find a minimal (minimum size) vertex covering for this graph. Can 
you use this to find a maximal (maximum size) independent set? 


Exercise 2.9 
Find the clique and independence numbers of the graph shown in 
Fig. 2.6(a) and (b). 


Exercise 2.10 

Prove Theorem 2.42. |Hint: Use the definition of graph complement 
and the fact that if an edge is present in a graph G, it must be absent 
in its complement.| 


Exercise 2.11 
Illustrate by exhaustion that removing any vertex from the proposed 
covering in Fig. 2.8 destroys the covering property. 
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Chapter 3 


Walks, Cycles, Cuts, and Centrality 


Remark 3.1 (Chapter goals). In this chapter, we discuss the idea 
of traversing a graph along its edges in what is called a walk in a 
graph. We discuss special kinds of walks called Hamiltonian paths 
and Eulerian trails. Cycles are discussed along with additional graph- 
theoretic vocabulary. We discuss connected and disconnected graphs 
and special sets of edges and vertices (called cuts), whose removal 
disconnects a graph. Applications to centrality (vertex rankings) are 
also discussed. 


3.1 Paths, Walks, and Cycles 


Definition 3.2 (Walk). Let G = (V,E) be a graph. A walk w = 
(v1, €1, V2, €2,-+-;Un,€n; Unt1) in G is an alternating sequence of ver- 
tices and edges in V and E, respectively, so that for allz =1,...,n, 
{v;, Viti} = e;. A walk is called closed if vy = vn41 and open other- 
wise. A walk consisting of only one vertex is called trivial. 


Definition 3.3 (Sub-walk). Let G = (V,E) be a graph. If w is 
a walk in G, then a sub-walk of w is any walk w’ that is also a 
subsequence of w. 


Remark 3.4. Let G = (V, E) to each walk 


WwW = (v1, €1, V2, €2;- oe Ung Oras Und): 
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We can associate a subgraph H = (V’, E’) with: 


(1) V' = {u1,.--,Unti}; and 
(2) BP = 4 6 jin i. oy tg be 


We call this the subgraph induced by the walk w. 


Definition 3.5 (Trail/Tour). Let G = (V,E) be a graph. A trail 
in G is a walk in which no edge is repeated. A tour is a closed trail. 
A trail or tour is called Eulerian if it contains each edge in E exactly 
once. 


Definition 3.6 (Path). Let G = (V, £) beagraph. A path inGisa 
nontrivial walk with no vertex and no edge repeated. A Hamiltonian 
path is a path that contains exactly one copy of each vertex in V. 


Definition 3.7 (Cycle). A closed walk with a length of at least 3 
and no repeated edges and in which the only repeated vertices are the 
first and the last is called a cycle. A cycle in a graph is Hamiltonian 
if it contains every vertex. 


Definition 3.8 (Hamiltonian and Eulerian graphs). A graph 
G = (V,E) is said to be Hamiltonian if it contains a Hamiltonian 
cycle and Eulerian if it contains an Eulerian tour. 


Example 3.9. We illustrate an Eulerian trail and a Hamiltonian 
path in Fig. 3.1. Note that Vertex 1 is repeated in the trail, meaning 
that this is not a path. We contrast this with Fig. 3.3(b), which shows 
a Hamiltonian path. Here, each vertex occurs exactly once in the 
illustrated path, but not all the edges are included. In this graph, it 
is impossible to have either a Hamiltonian cycle or an Eulerian tour. 


Example 3.10. An Eulerian graph and a separate Hamiltonian 
graph are shown in Fig. 3.2. The edges are numbered in the order 
one could use to show the tour or the cycle. Note that vertices are 
reused in the tour, while only the first and last vertices are the same 
in the Hamiltonian cycle (as required). 


Remark 3.11. We will return to the discussion of Eulerian graphs 
in Section 4.3. 


Definition 3.12 (Length). The length of a walk is the number of 
edges contained in it. 
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Fig. 3.1 (a) An Eulerian trail and (b) a Hamiltonian path are illustrated. 


Fig. 3.2. (a) An Eulerian tour and (b) a Hamiltonian cycle are illustrated. These 
graphs are Eulerian and Hamiltonian, respectively. 


Example 3.13. A walk is illustrated in Fig. 3.3(a). Formally, this 
walk can be written as 


w = (1,{1,4},4, {4,2}, 2, {2, 3}, 3). 
The cycle shown in Fig. 3.3(b) can be formally written as 
e= (141,474,142) 212.2845, 11,1): 


Note that the cycle begins and ends with the same vertex (that’s 
what makes it a cycle). Also, w is a sub-walk of c. Note further that 
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(a) (b) 
Fig. 3.3. (a) A walk and (b) a cycle are illustrated. 


we could easily have represented the walk as 
w= 18,1 3,2).2.42,4) 444,111): 


In general, we can shift the ordering of the cycle in anyway (for 
example, beginning at Vertex 2). Thus, we see that in an undirected 
graph, a cycle or walk representation may not be unique. 


Remark 3.14. If w is a path in a graph G = (V, E), then the sub- 
graph induced by w is simply the graph composed of the vertices and 
edges in w. 


Definition 3.15 (Path graph). Suppose that G = (V,F) is a 
graph with |V| = n. If w is a Hamiltonian path in G and H is 
the subgraph induced by w and H = G, then G is called a n-path or 
a path graph on n vertices, denoted by Py. 


Definition 3.16 (Cycle graph). If w is a Hamiltonian cycle in G 
and H is the subgraph induced by w and H = G, then G is called a 
n-cycle or a cycle graph on n vertices, denoted by C,. 


Example 3.17. We illustrate a cycle graph with six vertices (6-cycle 
or C¢) and a path graph with four vertices (4-path or P,) in Fig. 3.4. 


Remark 3.18. Walks, cycles, paths, and tours can all be extended to 
the case of digraphs. In this case, the walk, path, cycle, or tour must 
respect the edge directionality. Thus, if w = (...,0;, ei, Vigi,---) isa 
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(a) 


Fig. 3.4 (a) 6-cycle and (b) 4-path. 


directed walk, then e; = (v;, vj+1) in an ordered pair in the edge set 
of the graph. 


3.2. More Graph Properties: Diameter, Radius, 
Circumference, and Girth 


Definition 3.19 (Distance). Let G = (V,E). The distance 
between v; and v2 in V is the length of the shortest walk begin- 
ning at v, and ending at vo if such a walk exists. Otherwise, it is 
+oo. We write dg(v1, v2) for the distance from v1 to v2 in G. 


Definition 3.20 (Directed distance). Let G = (V,E) be a 
digraph. The (directed) distance between v, to v2 in V is the length 
of the shortest directed walk beginning at v; and ending at v2 if such 
a walk exists. Otherwise, it is +oo. 


Definition 3.21 (Diameter). Let G = (V,E) be a graph. The 
diameter of G, denoted by diam(G), is the length of the largest 
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distance in G. That is, 


diam(G’) = Be dg (v1, v2). (3:1) 


Definition 3.22 (Eccentricity). Let G = (V, F), and let uv; € V. 
The eccentricity of v; is the largest distance from v; to any other 
vertex v2 in V. That is, 


ecc(v1) = max dg(v1, v2). (3.2) 
v2EV 


Remark 3.23. Naturally, we can define diameter in terms of eccen- 
tricity as 


diam(G) = max ecc(v). (3.3) 


Definition 3.24 (Radius). Let G = (V,E). The radius of G is the 
minimum eccentricity of any vertex in V. That is, 


= mi = mi : A 
rad(G) min ecc(v1) min max de(v1, 2) (3.4) 


Definition 3.25 (Girth). Let G = (V,E) be a graph. If there is 
a cycle in G (that is, G has a cycle-graph as a subgraph), then the 
girth of G is the length of the shortest cycle. When G contains no 
cycle, the girth is defined as 0. 


Definition 3.26 (Circumference). Let G = (V,E) be a graph. If 
there is a cycle in G (that is, G has a cycle-graph as a subgraph), 
then the circumference of G is the length of the longest cycle. When 
G contains no cycle, the circumference is defined as +00. 


Example 3.27. The eccentricities of the vertices of the graph shown 
in Fig. 3.5 are: 


(1) Vertex 1: 1, 
(2) Vertex 2: 2, 
(3) Vertex 3: 2, 
(4) Vertex 4: 2, and 
(5) Vertex 5: 2. 
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Fig. 3.5 A graph with a diameter of 2, radius of 1, girth of 3, and circumference 
of 4. 


This means that the diameter of the graph is 2 and its radius is 1. We 
have already seen that there is a 4-cycle subgraph in the graph — see 
Fig. 3.3(b). This is the largest cycle in the graph, so the circumference 
of the graph is 4. There are several 3-cycles in the graph (an example 
being the cycle (1, {1,2}, 2, {2,4}, 4, {4, 1}, 1)). The smallest possible 
cycle is a 3-cycle. Thus, the girth of the graph is 3. 


3.3. Graph Components 


Definition 3.28 (Reachability). Let G = (V,£), and let v; and 
vg be two vertices in V. Then, v2 is reachable from v if there is a 
walk w beginning at v; and ending at v2 (alternatively, the distance 
from v1 to v2 is not +00). If G is a digraph, we assume that the walk 
is directed. 


Definition 3.29 (Connectedness). A graph G is connected if for 
every pair of vertices v; and v2 in V, v2 is reachable from vj. If G is 
a digraph, then G is connected if its underlying graph is connected. 
A graph that is not connected is called disconnected. 


Definition 3.30 (Strong connectedness). A digraph G_ is 
strongly connected if for every pair of vertices vy and vg in V, v2 
is reachable (by a directed walk) from vj. 


Remark 3.31. In Definition 3.30, we are really requiring, for any 
pair of vertices v1 and v2, that v, be reachable from vg and vg be 
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(b) (c) 


Fig. 3.6 (a) A connected graph, (b) a disconnected graph with two components, 
and (c) a connected digraph that is not strongly connected. 


reachable from v, by directed walks. If this is not possible, then a 
directed graph could be connected but not strongly connected. 


Example 3.32. In Fig. 3.6, we illustrate a connected graph, a dis- 
connected graph, and a connected digraph that is not strongly con- 
nected. 


Definition 3.33 (Component). Let G = (V,E) be a graph. A 
subgraph H of G is a component of G if: 


(1) A is connected; and 

(2) K is a subgraph of G and H is a proper subgraph of K, then 
K is not connected. The number of components of a graph G is 
written as c(G). 


Remark 3.34. A component H of a graph G is a mazimally con- 
nected subgraph of G. Here, maximal is taken with respect to the 
subgraph ordering. That is, H is less than K in the subgraph order- 
ing if H is a proper subgraph of Kk. 


Example 3.35. Figure 3.6(b) contains two components: 3-cycle and 
2-path. 


Proposition 3.36. A connected graph G has exactly one component. 


Definition 3.37 (Edge deletion graph). Let G = (V, £), and let 
E’ C E. Then, the graph G’ resulting from deleting the edges in E’ 
from G is the subgraph induced by the edge set FE \ E’. We write this 
as G’ =G- E’. 
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Definition 3.38 (Vertex deletion graph). Let G = (V,£), and 
let V’ CV. Then, the graph G’ resulting from deleting the edges in 
V’ from G is the subgraph induced by the vertex set V \V’. We write 
this as G’=G-V’'. 


Definition 3.39 (Vertex cut and cut vertex). Let G = (V,£) 
be a graph. A set V’ C V is a vertex cut if the graph G" resulting 
from deleting vertices V’ from G has more components than graph 
G. If V’ = {v} is a vertex cut, then v is called a cut vertez. 


Definition 3.40 (Edge cut and cut edge). Let G = (V, EF) bea 
graph. A set E’ C E is an edge cut if the graph G’ resulting from 
deleting edges E’ from G has more components than graph G. If 
E’ = {e} is an edge cut, then e is called a cut edge. 


Definition 3.41 (Minimal edge cut). Let G = (V,F). An edge 
cut E’ of G is minimal if when we remove any edge from E” to form 
E", the new set E” is no longer an edge cut. 


Example 3.42. In Fig. 3.7, we illustrate a vertex cut and a cut 
vertex (a singleton vertex cut) and an edge cut and a cut edge (a 
singleton edge cut). Note that the edge cut in Fig. 3.7(a) is min- 
imal and cut edges are always minimal. A cut edge is sometimes 
called a bridge because it connects two distinct components in a 
graph. Bridges (and small edge cuts) are a very important part of 
social network analysis [17,18] because they represent connections 


Cut Vertex 


Cut Edge 


Edge Cut Vertex Cut 


(a) Edge Cut and Cut Vertex (b) Vertex Cut and Cut Edge 


Fig. 3.7 Illustrations of a vertex cut and a cut vertex (a singleton vertex cut) 
and an edge cut and a cut edge (a singleton edge cut). Cuts are sets of vertices 
or edges whose removal from a graph creates a new graph with more components 
than the original graph. 
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between different communities. To see this, suppose that Fig. 3.7(b) 
represents the communication patterns between two groups. The fact 
that Members 5 and 6 communicate and that these are the only two 
individuals who communicate between these two groups could be 
important in understanding how these groups interrelate. 


Theorem 3.43. Let G = (V, E) be a connected graph, and lete € E. 
Then, G’ = G—{e} is connected if and only if e lies on a cycle in G. 


Proof. (<) Recall that a graph G is connected if and only if for 
every pair of vertices vj and vpn+1, there is a walk w from v, to Uni 
with 


We = (Wis 015 Ges 24 es Oa tesa) 


Let G’ = G — {e}. Suppose that e lies on a cycle c in G, and choose 
two vertices vy and vp+1 in G. If e is not on any walk w connecting 
V1 tO Un+1 in G, then the removal of e does not affect the reachability 
of vy and vp, in G’. Therefore, assume that e is in the walk w. The 
fact that e is in a cycle of G implies that we have vertices u1,...,Um 
and edges fi,..., fm so that 


C= (tig fines they Fens Ul) 


is a cycle and e is among f1,..., fm. Without loss of generality, 
assume that e = fm and that e = {um,ui}. (Otherwise, we can 
reorder the cycle to make this true.) Then, in G’, we have the path 


of = (Wis Figees «Feats lon) 


The fact that e is in the walk w implies that there are vertices uv; and 
vi41 So that e = {v;, vi41} (with vj = uz and vj41 = Um). In deleting 
e from G, we remove the sub-walk (v;,e, vj41) from w. However, we 
can create a new walk with the structure 


/ 
WU = (v1, €1, tee Ui, fi, Ua, - . 5 Um— 13 dm—isthms tee veny Und }s 


This is illustrated in Fig. 3.8. 
(=) Suppose G’ = G — {e} is connected. Now, let e = {v1, Unit}.- 
Since G’ is connected, there is a walk from v; to un+4,. Applying 
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al v2 Vi Vi41 Un+1 


Fig. 3.8 Ife lies on a cycle, then we can repair path w by going the long way 
around the cycle to reach vn+1 from v1. 


Remark 4.28, we can reduce this walk to a path p, with 


P= (Wpbigasas Oh Cee be 


Since p is a path, there are no repeated vertices in p. We can construct 
a cycle c containing e in G as 


P= (01 Oily on3s yp GCs G1) 


since e = {v1, Un4i} = {Un41,v1}. Thus, e lies on a cycle in G. This 
completes the proof. 


Corollary 3.44. Let G = (V, E) be a connected graph, and lete € E. 
The edge e is a cut edge if and only if e does not lie on a cycle in G. 


Definition 3.45 (k-connected). A graph is & connected if it has 
at least k vertices and if the graph remains connected after removing 
fewer than k vertices (to form the vertex deletion graph). 


Remark 3.46. We pick up the study of connectivity when we discuss 
flows and cuts on graphs in Chapter 6. In particular, k-connectivity 
is an important part of building reliable networks [19]. 


Remark 3.47. The following result is taken from extremal graph 
theory, the study of extremes or bounds in the properties of graphs. 
There are a number of results in extremal graph theory that are of 
interest. See Ref. [20] for a complete introduction. 


Theorem 3.48. If G = (V,E) is a graph with n vertices and k 
components, then 
(n—k+1)(n—k) 


IE| < 
2 


(3.5) 
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Proof. Assume that each component of G has n; vertices in it, with 
yo n; =n. Applying Lemma 2.23, we know that component 7 has 
at most n;(n; — 1)/2 edges; that is, each component is a complete 
graph on n,; vertices. This is the largest number of edges that can 
occur under these assumptions. 

Consider the case where k — 1 of the components have exactly 
one vertex and the remaining components have n — (k — 1) vertices. 
Then, the total number of edges in this case is at. most 


(n —(k—1))(n-(k-1) 1) _ (v= k+1)(n—k) 


2 2 


It now suffices to show that this case has the highest number of ver- 
tices of all cases where the k components are each complete graphs. 
Consider the case when component 7 is K; and component j is 
K,, with r,s > 2, and suppose r > s. Then, the total number of 
edges in these two components is 
r(r—1) + s8(s—1) 7 Gd Se a ae 


2 2 


Now, suppose we move one vertex in component j to component 7. 
Then, component i is now K;,4,; and component j is now K,_1. 
Applying Lemma 2.23, the number of edges in this case is 


(r + 1)(r) + (s — 1)(s — 2) _rtr+s?—3s+2 


2 2 
Observe that since r > s, substituting s for r, we have 
pip te 3949S 7? 46° — 2549, 
By a similar argument, 
r+s?-Is>r?+s*-r—s. 
Thus, we conclude that 


rtp s* =3e42 re pipe = 2e+2 . r? + 57 — 2s 
2 a 2 — 2 
* ia eral ge 
~ 2 
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Repeating this argument over and over shows that in a k-component 
graph with n vertices, the largest number of edges must occur in the 
case when there is one complete component with n — (k— 1) vertices 
and k — 1 components with exactly one vertex. Thus, the number 
of edges in a graph with k components must satisfy Eq. (3.5). This 
completes the proof. 


Corollary 3.49. Any graph with n vertices and more than (n — 1) 
(n — 2)/2 edges is connected. 


3.4 Introduction to Centrality 
Remark 3.50. There are many situations in which we’d like to mea- 


sure the importance of a vertex in a graph. The problem of measuring 
this quantity is usually called determining a vertex’s centrality. 


Definition 3.51 (Degree centrality). Let G = (V, EF) bea graph. 
The degree centrality of a vertex is just its degree. That is, 


Co(v) = deg(v). 
These values can be normalized to lie in the interval [0,1] by using 


ese et 


Remark 3.52. Degree centrality is only the simplest measurement 
of centrality. There are many other measures of this quantity. We 
discuss one more and then continue our discussion of this topic in 
Chapter 10. 


Definition 3.53 (Geodesic centrality). Let G = (V,E) be a 
graph. The geodesic centrality (sometimes called the betweenness cen- 
trality) of a vertex v € V is the fraction of times v occurs on any 
shortest path connecting any other pair of vertices s,t € V. Put more 
formally, let o,4 be the total number of shortest paths connecting ver- 
tex s to vertex t. Let o¢(v) be the number of these shortest paths 
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containing v. The geodesic centrality of v is 


Ca(v)= > Osel0) (3.6) 


Cd. 
sAtZvu se 


These values can be normalized so that they fall within [0,1] by 
dividing each Cg(v) by the sum of all Cg(v) so that 


A Cp(v) 
Ca(v) = ———_.. 
oat Cp (v) 
Example 3.54. Consider the graph with four vertices shown in 
Fig. 3.9. The degrees of the graph are (2,3, 3,2), which is the unnor- 


malized degree centrality. The normalized degree centrality of the 
vertices is 


Cp(v1) = ; Cp(v2) = 
Crls)=— — Cn(vs) = 


of S| 


To compute the normalized geodesic centrality, we must compute 
the fraction of times a vertex appears in a shortest path. This is 
shown in Table 3.1. In the vertex pair (1,2), there is exactly one 
shortest path connecting 1 to 2. Since 1 and 2 are the end points, 
they are not counted. Vertices 3 and 4 do not appear in this shortest 
path, so they each receive a zero. For (1,4), there are two shortest 
paths (one through 2 and the other through 3); therefore, half of 
the shortest paths contain vertex 2 and half of the shortest paths 
contain vertex 3. The remainder of the table is filled out in exactly 


Fig. 3.9 Graph with four vertices. 
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Table 3.1 A table showing the inter- 
mediate computations for geodesic 


centrality. 

Vertex Pair 1 2 3 4 
(1,2) — — 0) 0 
(1,3) — 0 — 0 
(1,4) — 3 5 —_ 
(2,3) 0 — = 0 
(2,4) 0 — 0 = 
(3,4) 0 0 = = 
SUM 0 5 3 0 


the same way. The normalized geodesic centrality is 


Crist Ca= 5 
i= 5 Ci 0 


In this case, we see that the geodesic centrality is similar in its order- 
ing of the degree centrality but different in its values. 


Remark 3.55. It’s clear from this analysis that cut vertices should 
have high geodesic centrality if they connect two large components 
of a graph. Thus, by some measures, cut vertices are very important 
elements of graphs. 


3.5 Chapter Notes 


Hamiltonian cycles (and paths) are named after William Rowan 
Hamilton, the most famous Irish mathematician. Most math students 
know Hamilton for his work on the quaternions [1]. Most physics stu- 
dents know Hamilton for his invention of Hamiltonian mechanics [21], 
which rephrased Newton’s laws in a much more general language, 
helping to pave the way for quantum mechanics. Hamiltonian cycles 
are named after Hamilton as a result of his invention of the icosian 
game, which involves finding Hamiltonian cycles on the edges of a 
dodecahedron. In reality, Hamiltonian paths (as they came to be 
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Fig. 3.10 A knight’s tour showing that a knight can transit every chess square 
making only legal moves and returning to its starting position. 


known) were studied decades earlier by both Kirkman [22] and (of 
course) Euler in the form of knight’s tours on chessboards [23]. In 
a knight’s tour problem, the graph is composed of vertices given by 
chess squares, and the set of legal moves a knight may make forms 
the edge set. This is illustrated in Fig. 3.10 

Extremal graph theory is a subfield within graph theory dealing 
with the relationship between local and global structures in graphs. 
The field was first investigated by Mantel and Turan [20]. Naturally, 
Erdés made a contribution as well [24]. 

Work in social networks is largely distinct from graph theory as 
a mathematical study. Zachary’s karate club [25] is a common first 
example of the use of graph theory in the study of social networks. 
However, the use of centrality in studying social networks predates 
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this by at least two decades with the work of Katz [26], who defined 
“Katz centrality,” which is similar to PageRank centrality, which we 
will discuss in later chapters. PageRank was (of course) created by 
Brin and Page [27] and formed the basis for the Google search engine. 


3.6 Exercises 


Exercise 3.1 

Prove that it is not possible for a Hamiltonian cycle or Eulerian tour 
to exist in the graph in Fig. 3.3(a); i.e., prove that the graph is neither 
Hamiltonian nor Eulerian. 


Exercise 3.2 
Suppose a graph is Hamiltonian. Does it have a cut vertex? Explain 
your answer. 


Exercise 3.3 

Formally define for yourself: directed walks, directed cycles, directed 
paths, and directed tours for directed graphs. [Hint: Begin with Def- 
inition 3.2 and make appropriate changes. Then, do this for cycles, 
tours, etc.| 


Exercise 3.4 
Show that Eq. (3.3) is true. 


Exercise 3.5 
Compute the diameter, radius, girth, and circumference of the 
Petersen graph. 


Exercise 3.6 

In an online social network, Alice is friends with Bob and Charlie. 
Charlie is friends with David and Edward. Edward is friends with 
Bob: 


(1) Find a maximal (maximum size) independent set in the resulting 
social network graph. What can you interpret from such a set on 
a social network? 

(2) Find the diameter of the social network graph. 


Exercise 3.7 
Prove Proposition 3.36. 
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Exercise 3.8 
What are the sizes of the smallest vertex and edge cuts in the cycle 
graph Ci? 


Exercise 3.9 
Prove Corollary 3.44. 


Exercise 3.10 
Prove Corollary 3.49. 


Exercise 3.11 
Compute the geodesic centrality and the degree centrality for the 
graph shown in Fig. 3.11. Compare your results. 


Fig. 3.11 The graph for which centralities are to be computed. 


Chapter 4 


Bipartite, Acyclic, and 
Eulerian Graphs 


Remark 4.1 (Chapter goals). In this chapter, we build on our 
work on walks and paths to discuss special graphs: bipartite graphs, 
acyclic graphs, and Eulerian graphs. We prove results that charac- 
terize these graphs. Special attention will be paid to trees, which are 
critical for computer science and game theory. 


4.1 Bipartite Graphs 


Definition 4.2. A graph G = (V, E) is bipartite if V = Vj U V2 and 
ViNV2 = 0, and ife € E, then e = {v1, v2}, with vy; € Vi and v2 € Vo. 
This definition is valid for non-simple graphs as well. 


Remark 4.3. In a bipartite graph, we can think of the vertices as 
belonging to one of two classes (either Vj or V2) and edges only 
existing between elements of the two classes, not between elements 
in the same class. We can also define n-partite graphs, in which the 
vertices are in any of n classes and there are only edges between 
classes, not within classes. 


Example 4.4. Figure 4.1 shows a bipartite graph in which V, = 
{1,2,3} and V2 = {4,5}. Note that there are only edges connect- 
ing vertices in V; and vertices in Vj. There are no edges connecting 
elements in V; to other elements in V; or elements in Vj to other 
elements in Vo. 
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Fig. 4.1 A bipartite graph has two classes of vertices, and edges in the graph 
only exist between elements of different classes. 


Definition 4.5 (Complete bipartite graph). The graph Ky, is 
the complete bipartite graph consisting of the vertex set V = 
{v11,---, Vim} U {vo1,..-, Van} and having an edge connecting every 
element of Vj to every element of V2. 


Definition 4.6 (Path concatenation). Let p, = (v1,€1,v2,..., 


Un; En; Unda) and P2 = (Un+1; Ent 1, Un+2)+++5Un+m; En+m; Un+m4 1). 
Then, the concatenation of path p, with path pg is the path 


Pp = (v1, €1, V2, cee > Un; En; Un+1; Cn+1; Un+2; cee Onda Cnn Onda }: 


Remark 4.7. Path concatenation is used in the proof of Theo- 
rem 4.8. 


Theorem 4.8. A graph G = (V,E) is bipartite if and only if every 
cycle in G has an even length. 


Proof. (=) Suppose G is bipartite. Every cycle begins and ends 
at the same vertex and, therefore, in either V; or V2. Without loss 
of generality, suppose we start with V;. Starting at a vertex v; € Vi, 
we must take a walk of length 2 to return to Vj. The same is true if 
we start at a vertex in Vo. Thus, every cycle must contain an even 
number of edges in order to return to either V; or Vo. 
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(<) Suppose that every cycle in G has an even length. Without 
loss of generality, assume G' is connected. We create a partition of V 
so that V = Vj UVa and Vi N V2 = 9, and there is no edge between 
vertices if they are in the same class. 

Choose an arbitrary vertex v € V, and define 


VY, ={v' EV :dg(v,v')=0 mod 2} and (4.1) 
Vo = {v' €V :dg(v,v') =1 mod 2}. (4.2) 


Clearly, V; and V2 constitute a partition of V. Choose u4,ug € Vi, 
and suppose there is an edge e = {u,,u2} € E. The distance from 
v to uz is even, so there is a path p; with an even number of edges 
beginning at v and ending at u,. Likewise, the distance from v to u2 
is even, so there is a path pg beginning at ug and ending at v with 
an even number of edges. If we concatenate paths p; and the length 
1 path gq = (uy, {ur, uz}, u2) and path po, we obtain a cycle in G that 
has an odd length (see Fig. 4.2). Therefore, there can be no edge 
connecting two vertices in Vj. 

Choose u1, u2 € V2, and suppose there is an edge e= {u1, u2} € FE. 
Using the same argument, there is a path p,; of odd length from v to 
u, and a path pe of odd length from uz to v. If we concatenate paths 
p, and the length 1 path gq = (ui, {u1, ug}, u2) and path p2, we again 
obtain a cycle in G that has an odd length. Therefore, there can be 
no edge connecting two vertices in Vo. 

In the case when G has more than one component, execute the 
process described above for each component to obtain partitions 


U1 
f v 


\ Both even or odd length paths 


Fig. 4.2. Illustration of the main argument in the proof that a graph is bipartite 
if and only if all cycles have even length. 
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Vi, V2, V3, Va,..., Von. Create a bipartition U; and U2 of V with 


U, = U Vor—1 and (4.3) 
k=1 

U2 = (J Vir. (4.4) 
k=1 


There can be no edge connecting a vertex in U; with a vertex in U2. 
Thus, G is bipartite. This completes the proof. 


4.2 Acyclic Graphs and Trees 


Definition 4.9 (Acyclic graph). A graph that contains no cycles 
is called acyclic. 


Definition 4.10 (Forests and trees). Let G = (V,E) be an 
acyclic graph. If G has more than one component, then G is called 
a forest. If G has one component, then G is called a tree. 


Example 4.11. A randomly generated tree with 10 vertices is shown 
in Fig. 4.3. Note that the tree (if drawn upside down) can be made 
to look like a real tree growing up from the ground. 


Proposition 4.12. Every tree is a bipartite graph. 


Remark 4.13. We can define directed trees and directed forests as 
acyclic directed graphs. Generally, we require the underlying graphs 
to be acyclic rather than just having no directed cycles. Directed trees 
are used frequently in computer science [28], operations research [29], 
and game theory [30,31]. For the remainder of this chapter, we deal 
with undirected trees, but the results presented will also apply to 
directed trees unless otherwise stated. 


Definition 4.14 (Spanning forest). Let G = (V,E) be a graph. 
If F = (V’, E’) is an acyclic subgraph of G such that V = V’, then 
F is called a spanning forest of G. If F has exactly one component, 
then F' is called a spanning tree. 


Example 4.15. A spanning tree for the Petersen graph is illustrated 
in Fig. 4.4. Since the Petersen graph is connected, it is easy to see that 
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Fig. 4.3 A tree is shown. Imagining the tree upside down illustrates the tree-like 
nature of the graph structure. 


Fig. 4.4 The Petersen graph with a spanning tree shown below. 
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we do not need a spanning forest to construct an acyclic spanning 
subgraph. 


Theorem 4.16. If G = (V,E) is a connected graph, then there is a 
spanning tree T = (V,E’) of G. 


Proof. We proceed by induction on the number of vertices in G. If 
|\V| = 1, then G is itself a (degenerate) tree and thus has a spanning 
tree. Now, suppose that the statement is true for all graphs G with 
|V| < n. Consider a graph G with n+ 1 vertices. Choose an arbi- 
trary vertex Uj+1 and remove it and all edges of the form {v, Un+1} 
from G to form G’ with vertex set V’ = {v1,...,Un}. The graph G’ 
has n vertices and may have m > 1 components (m > 1 if und 
was a cut vertex). By the induction hypothesis, there is a spanning 
tree for each component of G’ since each of these components has 
at most n vertices. Let 7\,...,Zim, be the spanning trees for these 
components. 

Let T’ be the acyclic subgraph of G consisting of all the com- 
ponents’ spanning trees. For each spanning tree, choose exactly one 
edge from E of the form e = {vn41,v}, where v is a vertex 
in component 7 and add this edge to T’ to create the tree T. (See 
Fig. 4.5.) It is easy to see that no cycle is created in T through these 


Fig. 4.5 Adding edges back in that were removed in the creation of the m com- 
ponents does not cause a cycle to form. The result is a spanning tree of the 
connected graph. 
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operations because, by construction, each edge e is a cut edge, and 
by Corollary 3.44, it cannot lie on a cycle. The graph 7 contains 
every vertex of G and is connected and acyclic. Therefore, it is a 
spanning tree of G. The theorem follows by induction. 


Corollary 4.17. Every graph G = (V,E) has a spanning forest 
F=(V,E’). 


Definition 4.18 (Leaf). Let T = (V, £). If v € V and deg(v) = 1, 
then v is called a leaf of T. 


Lemma 4.19. Every tree with one edge has at least two leaves. 


Proof. Let 


w= (01561, 25 «+23 Vny On, Uns) 


be a path of maximal length in JT. Consider vertex vyj41. If 
deg(Un+41) > 1, then there are two possibilities, as follows: (i) There 
is an edge e,4; and a vertex Upj+2 with vpy2 not in the sequence w. 
In this case, we can extend w to w’ defined as 


/ 
Ww = (v1, €1, V2, seey Un; Cis Onde tg), 


which contradicts our assumption that w was maximal in length. 
(ii) There is an edge e€n41 and a vertex Un+i2 and for some 
ke {1,...,n}, Untea = vp; ie., Un+2 is in the sequence w. In this 
case, there is a closed sub-walk 


/ 
tl = (Ups Bes Veli 20; Up aie ets ase). 


Since w is a path, there are no other repeated vertices in the sequence 
w’; thus, w’ is a cycle in T, contradicting our assumption that T’ was 
a tree. The reasoning above holds for vertex v; as well; thus, the two 
end points of every maximal path in a nondegenerate tree must be 
leaves. This completes the proof. 


Corollary 4.20. Let G = (V,E) be a graph. If each vertex in V has 
a degree of at least 2, then G contains a cycle. 


Lemma 4.21. Let T = (V,E) be a tree with |V| =n. Then, |E| = 
nm—1. 


56 Applied Graph Theory 


Proof. We proceed by induction. For the case when n = 1, this 
statement must be true. Now, suppose that the statement is true for 
|V| <n. We show that when |V| = n+1, then |E| = n, assuming that 
T =(V,£) isatree. By Lemma 4.19, we know that if T is a tree, then 
it contains one component and at least two leaves. Therefore, choose 
a vertex vu € V that is a leaf in T. There is some edge e = {v’, v} € E. 
Consider the graph T’ = (V’, E’) with V’ = V\{v} and E’ = E\ {e}. 
This new graph 7” must: 


(1) have one component since v was connected to only one other 
vertex v’ € V and T had only one component; and 

(2) be acyclic since T itself was acyclic and we have not introduced 
new edges to create a cycle. 


Therefore, T’ is a tree with n vertices, and by the induction hypoth- 
esis, it must contain n — 1 edges. Since we removed exactly one edge 
(and one vertex) to construct 7’ from T, it follows that T had exactly 
n edges and our originally assumed n+1 vertices. The required result 
follows immediately from induction. 


Corollary 4.22. If G = (V,F) is a forest with n vertices, then G 
has n — c(G) edges. (Recall that c(G) is the number of components 
in G.) 


Theorem 4.23. A graph G = (V,E) is connected if and only if it 
has a spanning tree. 


Theorem 4.24 (Tree characterization theorem). Let T = 
V,E) be a graph with |\V| =n. Then, the following are equivalent: 


1) T is a tree. 

2) T is acyclic and has exactly n — 1 edges. 

3) T is connected and has exactly n — 1 edges. 

4) T is connected and every edge is a cut edge. 

5) Any two vertices of T are connected by exactly one path. 

6) T is acyclic and the addition of any new edge creates exactly one 
cycle in the resulting graph. 


( 
( 
( 
( 
( 
( 
( 


Proof. (1 ==> 2) Assume T is a tree. Then, by definition, T is 
acyclic and the fact that it has n — 1 edges follows from Lemma 4.21. 


Bipartite, Acyclic, and Eulerian Graphs 57 


(2 == 3) Since T is acyclic, it must be a forest, and by Corol- 
lary 4.22, |E| =n —c(T). Since we assumed that T has n — 1 edges, 
we must have n — c(T) = n — 1, thus the number of components of 
T is 1 and T must be connected. 

(3 ==> 4) The fact that T is connected is assumed from (3). 
Suppose we consider the graph T’ = (V, E’), where E’ = E \ {e}. 
Then, the number of edges in J” is n — 2. The graph T” contains 
n vertices and must still be acyclic (that is a forest); therefore, n—2 = 
n—c(T"). Thus, c(Z”) = 2 and e€ is a cut edge. 

(4 = > 5) Choose two vertices v and v’ in V. The fact that there 
is a path between v and v’ is guaranteed by our assumption that T is 
connected. By way of contradiction, suppose that there are at least 
two paths from v to v’ in T.. These two paths must diverge at some 
vertex w € V and recombine at some other vertex w’. (See Fig. 4.6.) 
We can construct a cycle in T by beginning at vertex w, following 
the first path to w’, and then following the second path back to w 
from wi’. 

By Theorem 3.43, removing any edge in this cycle cannot result 
in a disconnected graph. Thus, no edge in the constructed cycle is 
a cut edge, contradicting our assumption about 7. Thus, two paths 
connecting v and v’ cannot exist. 

(5 = > 6) The fact that any pair of vertices is connected in 
T implies that T is connected (i.e., it has one component). Now, 
suppose that T has a cycle (like the one illustrated in Fig. 4.6). 
Then, it is easy to see that there are (at least) two paths connecting 
w and w’, contradicting our assumption. Therefore, T is acyclic. The 
fact that adding an edge creates exactly one cycle can be seen in the 
following way: Consider two vertices v and uv’ and suppose the edge 
{v,v'} is not in EF. We know that there is a path 


(vu, {v, ur}, U1, s++,Un, {un,v },v') 


Fig. 4.6 The proof of 4 = > 5 requires us to assume the existence of two paths 
in graph T connecting vertex v to vertex v’. This assumption implies the existence 
of a cycle, contradicting our assumptions about T’. 
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in T connecting v and v’ and it is unique. Adding the edge {v, v'} 
creates the cycle 


C1 = (v, {0,4} Qa, aS »Un; {tig }50', {v,v'},v), 


so at least one cycle is created. To see that this cycle is unique, note 
that if there is another cycle present, then it must contain the edge 
{v,v'}. Suppose that this cycle is 


2 = (v, {v, wi}, wi, -++)Wn; {tiny hy t's {v,v'},v), 


where there is at least one vertex w; not present in the set 
{u1,...,Un} (otherwise, the two cycles are identical). We now see 
that there must be two disjoint paths connecting v and v’, namely 


(vu, 10, } ti, +++,Un; figs a) 
and 
(v, {v, wi}, wi,...,Wn, {Wn v },v’). 


This contradicts our assumption about JT’. Thus, the created cycle is 
unique. 

(6 = > 1) It suffices to show that T has a single component. 
Suppose that it is not so; there are at least two components of T. 
Choose two vertices v and v’ in V so that these two vertices are not 
in the same component. Then, the edge e = {v,v’} is not in FE, and 
adding it to E cannot create a cycle. To see why, note that if T’ is the 
graph that results from the addition of e, then e is now a cut edge. 
Applying Corollary 3.44, we see that e cannot lie on a cycle; thus, 
the addition of this edge does not create a cycle, contradicting our 
assumption about 7. Thus, 7’ must have a single component. Since 
it is acyclic and connected, T is a tree. This completes the proof. 


Definition 4.25 (Tree-graphic sequence). Recall that from 
Definition 2.13, a tuple d = (dj,...,d,) is graphic if there exists 
a graph G with a degree sequence of d. The tuple d is tree-graphic 
if it is both graphic and there exists a tree with a degree sequence 
of d. 


Theorem 4.26. A degree sequence d = (d1,...,dn) is tree-graphic 
if and only if 


Bipartite, Acyclic, and Eulerian Graphs 59 


Sa = 2n— 2. (4.5) 


Remark 4.27. One direction of the proof is left as an exercise. 


Proof. (<) Now, suppose that Eq. (4.5) holds. If n = 1, then 
d, = 0, and this is a degenerate tree (with one vertex). We now 
proceed by induction to establish the remainder of the theorem. If 
n = 2, 2n — 2 = 2, and if d,,dz > 0, then dj = dy = 1 by necessity. 
This is the degree sequence for a tree with two vertices joined by a 
single edge; thus, it is a tree-graphic degree sequence. Now, assume 
the statement holds for all integers up to some n. We show that it is 
true for n + 1. Consider a degree sequence (dj,...,dn+41) such that 


n+1 
S "di = 2(n+ 1) - 2 = 2n. 
i=1 


We assume that the degrees are ordered (largest first) and positive. 
Therefore, d, > 2 (because otherwise, dj +---+dn41 <n+1). We also 
assume that dy < n. Note that in the case where d; = n, we must have 
dg = dg = +--+ = dni, = 1. Moreover, if dy = dg = +--+ = dy_1 = 2, 
then dy, = dnit = 1. Since d; > 0 for i = 1,...,n+1 from the 
previous two facts, we see that for any positive value of d;, we must 
have dy, = dn, = 1 in order to ensure that dj +dog+---+dn41 = 2n. 
Consider the sequence of degrees 


d! = (dy —1,do,... dn). 


Since dj41 = 1, we can see that (dj — 1) +do+---+d, = 2n—- 2. 
Thus, a permutation of d’ to correct the order leads to a tree-graphic 
sequence by the induction hypothesis. Let T’ be the tree that results 
from this tree-graphic degree sequence, and let v; be the vertex with 
a degree of d; — 1 in T’. Then, by adding a new vertex vp41 to T” 
along with edge {v1, Un+1}, we have constructed a tree T with the 
original degree sequence of d. This new graph T’ must be connected 
since T’ was connected, and we have connected v,+1 to a vertex in 
T’. The new graph must be acyclic since v;41 does not appear in J”. 
The result follows by induction. 
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Remark 4.28. Suppose we have the walk 


(By; Corse cate Uead 
If for some m € {1,...,n} and for some k € Z, we have um = Um+k- 
Then, 
_ 
w= (Um, Emy+++ Orth Uma) 


is a closed sub-walk of w. The walk w’ can be deleted from the walk 
w to obtain a new walk 


w" = (v1, €1,V2,..+,Umths Cmtky Umtkt 1s ++ +5 Uns Ens Un41) 
that is shorter than the original walk. This is illustrated in Fig. 4.7. 


Lemma 4.29. Let G = (V,E) be a graph, and suppose that t is a 
nontrivial tour (closed trail) in G. Then, t contains a cycle. 


Proof. The fact that ¢ is closed implies that it contains at least 
one pair of repeated vertices. Therefore, a closed sub-walk of ¢ must 
exist since t itself has these repeated vertices. Let c be a minimal 
(length) closed sub-walk of t. We show that c must be a cycle. By 
way of contradiction, suppose that c is not a cycle. Then, since it is 
closed, it must contain a repeated vertex (that is not its first ver- 
tex). If we applied our observation from Remark 4.28, we could pro- 
duce a smaller closed walk c’, contradicting our assumption that c 
was minimal. Thus, c must have been a cycle. This completes the 
proof. 


Theorem 4.30. Let G = (V,E) be a graph and suppose that t is 
a nontrivial tour (closed trail). Then, t is composed of edge-disjoint 
cycles. 


omns © wt 
-O-© B O-O-O-+-O-+- 


Fig. 4.7 We can create a new walk from an existing walk by removing closed 
sub-walks from the walk. 
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Proof. We proceed by induction on the length of the tour. In the 
base case, assume that t is a one-edge closed tour, then G is a non- 
simple graph that contains a self-loop, and this is a single edge in ¢; 
thus, t is a (non-simple) cycle.t Now, suppose the theorem holds for 
all closed trails of length N or less. We show that the result holds for a 
tour of length N+1. Applying Lemma 4.29, we know that there is at 
least one cycle c in t. If we remove c from this tour, then we obtain 
a new tour ¢’ (containing all the remaining edges). We note some 
vertices may be disconnected. These can be ignored. The tour t’ has 
a length of at most N. We can now apply the induction hypothesis 
to see that this new tour ¢t’ is composed of disjoint cycles. From this 
construction, we conclude that t is composed of edge-disjoint cycles. 
The theorem is illustrated in Fig. 4.8. This completes the proof. 


Remark 4.31. This final theorem completely characterizes Eulerian 
graphs. We use results derived from our study of trees to prove the 
following theorem. 


Theorem 4.32 (Eulerian graph characterization theorem). 
Let G = (V,E) be a nonempty, nontrivial connected graph G. Then, 
the following are equivalent: 


(1) G is Eulerian. 
(2) The degree of every vertex in G is even. 


7 2 
5 nS 5 
P 7 = 7 + Ca 6 

Fig. 4.8 We show how to decompose a (Eulerian) tour into an edge-disjoint set 
of cycles, thus illustrating Theorem 4.30. 


‘If we assume that G is simple, then the base case begins with t having a length 
of 3. In this case, it is a 3-cycle. 
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(3) The set E is the union of the edge sets of a collection of edge- 
disjoint cycles in G. 


Proof. (1 = > 2) Assume G is Eulerian, then there is an Eulerian 
tour t of G. Let v be a vertex in G. Each time v is traversed while 
following the tour, we must enter v by one edge and leave by another. 
Thus, v must have an even number of edges adjacent to it. If v is 
the initial (and final) vertex in the tour, then we leave v in the very 
first step of the tour and return in the last step; thus, the initial (and 
final) vertex of the tour must also have an even degree. Thus, every 
vertex has an even degree. 

(2 = > 3) Since G is connected and every vertex has an even 
degree, it follows that the degree of each vertex is at least 2. Applying 
Corollary 4.20, we see that G must contain a cycle C. If this cycle 
includes every edge in G, then (3) is established. Suppose otherwise. 
Consider the graph G’ obtained by removing all edges in C. If we 
consider C’ as a subgraph of G, then each vertex in C' has exactly two 
edges adjacent to it. Thus, if v is a vertex in the cycle, then removing 
those edges in C' that are adjacent to it will result in a vertex v having 
two fewer edges in G” than it did in G. Since we assumed that every 
vertex in G had an even degree, it follows that every vertex in G’ 
must also have an even degree since we removed either two or zero 
edges from each vertex in G to obtain G’. We can repeat the previous 
process of constructing a cycle in G’ and, if necessary, forming G”. 
Since there are a finite number of edges in G, this process must stop 
at some point, and we will be left with a collection of edge-disjoint 
cycles C = {C,C’,...} whose union is the entire edge set of G. 

(3. = > 1) Assume that G is connected and that its edge set 
is the union of a collection of edge-disjoint cycles. We proceed by 
induction on the number of cycles. If there is only one cycle, then 
we simply follow this cycle in either direction to obtain a tour of 
G. Now, suppose that the statement is true for a graph whose edge 
set is the union of < n edge-disjoint cycles. We’ll show that the 
statement is true for a graph whose edge set is composed of n + 1 
edge-disjoint cycles. Denote the cycles as C1,...,Cn41. A subgraph 
G’ of G composed of only cycles C1,..., Cn will have m components, 
with 1 < m <n. Each component is composed of at most n edge- 
disjoint cycles; therefore, applying the induction hypothesis, each has 
a tour. Denote the components as Ky,..., Ky. The cycle Cp+1 shares 
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one vertex in common with at least one of these components (and 
perhaps all of them). Without loss of generality, assume that Ky is 
a component sharing a vertex in common with C;,+, (if not, reorder 
the components to make this true). Begin following the tour around 
Ay until we encounter the vertex v; that component Ay and Cy+1 
share. At this point, break the tour of kK, and begin traversing Cy41 
until (i) we return to v; or (ii) we encounter a vertex v2 that is shared 
by another component (say K2). In case (i), we complete the tour 
of Ky, and necessarily, we must have completed a tour of the entire 
graph since it is connected. In case (ii), we follow the tour of K 
until we return to vg and then continue following C;,,, until either 
case (i) occurs or case (ii) occurs again. In either case, we apply the 
same logic as before. Since there are a finite number of components, 
this process will eventually terminate with case (i), we complete the 
tour of Ky, and thus, we would have constructed a tour of the entire 
graph. Figure 4.8 also serves to illustrate this theorem. 


4.4 Chapter Notes 


Bipartite graphs appear frequently in applications, and we will see 
them again in Chapter 6, when we study matching problems. They 
occur in optimization in assignment problems (such as assigning jobs 
to workers) and in transportation problems, in which the flow of 
commodities from warehouses to stores is optimized. Reference [29] 
has a complete introduction to this subject. Bipartite graphs also 
appear in coding theory [32]. They are also used to describe Petri nets 
[33], which can be used to define simple discrete dynamical systems. 
Bipartite graphs are particularly useful in social network analysis 
[34] for modeling individuals and their attributes. For example, the 
bipartite graph between users and the websites they visit may be 
mined to obtain socially relevant information. 

Trees are ubiquitous in computer science. They appear in sev- 
eral algorithms that are fundamental to combinatorial optimization, 
as we'll discuss in Chapter 5. Reference [28] contains an extensive 
introduction to algorithms, many of which use trees. Knuth also pro- 
vides extensive information on trees and computer science [35] in his 
extensive work on programming. Family trees are a popular applica- 
tion of trees in everyday use. 
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Eulerian graphs and Eulerian trails/tours are used in bioinformat- 
ics [36] as well as CMOS circuit design [37]. Recently, Eulerian tours 
have also been used as a mechanism for displaying information [38]. 
Chapter 17 of Godsil and Royle’s work [39] discusses the interesting 
relationship between knots (mathematical abstractions of shoelace 
knots) and Eulerian tours. Also, the classic route inspection problem 
(also known as the Chinese postman problem) has, at its core, the 
problem of finding an optimal Eulerian tour [40]. 


4.5 Exercises 


Exercise 4.1 
A graph has a cycle with a length of 15. Is it bipartite? Why? 


Exercise 4.2 
Prove Proposition 4.12. [Hint: Use Theorem 4.8.] 


Exercise 4.3 
Prove Corollary 4.17. 


Exercise 4.4 
Prove Corollary 4.20. 


Exercise 4.5 
Prove Corollary 4.22. 


Exercise 4.6 
Prove Theorem 4.23. 


Exercise 4.7 
Decide whether the following degree sequence is tree graphic: 
(3, 2,2,1,1,1). If it is, draw a tree with this degree sequence. 


Exercise 4.8 
Draw a graph that is Eulerian but Hamiltonian cycle. Explain the 
reason for your answer. 


Exercise 4.9 
The following sequence is graphic (6, 2, 2,2,2,2,2,2,2). Is this graph 
Eulerian? Why? 
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Exercise 4.10 
Prove the necessity part (=) of Theorem 4.26. [Hint: Use Theo- 
rem 2.10.] 


Exercise 4.11 
Show by example that Theorem 4.32 does not necessarily hold if we 
are only interested in Eulerian trails. 
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Chapter 5 


Trees, Algorithms, and Matroids 


Remark 5.1 (Chapter goals). We study algorithms in this chap- 
ter. We focus on graph search algorithms in the form of breadth- 
and depth-first searches. We then discuss minimum spanning trees. 
Algorithms for finding minimum distance paths in graphs are then 
considered. We show an interesting application to currency trading 
that uses the Floyd—Warshall algorithm. The chapter concludes with 
a more theoretical treatment of matroids. 


5.1 Two-Tree Search Algorithms 


Remark 5.2. For the remainder of this section, we use the following, 
rather informal, definition of an algorithm: An algorithm is a set of 
steps (or operations) that can be followed to achieve a certain goal. 
We can think of an algorithm as having an input x and from which we 
obtain an output y. See Ref. [41] for a formal definition of algorithms 
in terms of Turing machines. 


Remark 5.3. Tree searching is the process of enumerating the ver- 
tices of a tree (for the purpose of “searching” them). One can consider 
this process as generating a walk containing all the vertices of the 
tree at least once or as a way to create a sequence of the vertices. In 
this section, we take the latter view, though it will be clear how to 
create a walk as well. 

The first algorithm, called breadth-first search (BFS), explores 
the vertices of a tree starting from a given vertex by exploring all the 
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neighbors of this given vertex, then the neighbors of the neighbors, 
and so on until all the vertices have been encountered. The algorithm 
in pseudocode is shown in Algorithm 1. 


Breadth-First Search on a Tree 

Input: T = (V, £) a tree, vo a starting vertex 

Initialize: F = (vo) {a sequence of vertices to enumerate} 
Initialize: Fyext = () {the sequence of next vertices to enumerate} 
Initialize: w = () {the sequence of vertices traversed} 


(1) while F 4 0 
for each v € F do 
Remove v from F’ 
Append v to w 
for each v’ € N(v) do 


if v' gw then 
Append v’ to Fhext 
end if 
end for 
end for 


F = Frext 
Frext = () 
(13) end while 


Output: w {a breadth-first sequence of vertices in T} 


Algorithm 1: Breadth-first search. 


Example 5.4. Figure 5.1 shows the order in which the vertices are 
added to w during a BFS of the tree. We start at Vertex a as Vertex 1 
and explore its immediate neighbors, adding Vertices b and c (2 and 
3) to our list of vertices. From there, we explore the neighbors of 
Vertices b and c to obtain Vertices d and e as the fourth and fifth 
vertices enumerated. 


Proposition 5.5. A BFS of a tree T = (V,E) enumerates all 
vertices. 


o @ 
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Fig. 5.1 The breadth-first walk of a tree explores the tree in an ever-widening 
pattern. 
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Proof. We proceed by induction. If JT has one vertex, then clearly 
vo in the algorithm is that vertex. The vertex is added to w in the 
first iteration of the while loop at Line 1 and Fyext is the empty set, 
thus the algorithm terminates. Now, suppose that the statement is 
true for all trees with at most n vertices. We show that the statement 
is true for a tree with n+1 vertices. To see this, construct a new tree 
T’ in which we remove a leaf vertex v’ from T. Clearly, the algorithm 
must enumerate every vertex in 7”; therefore, there is a point in 
which we reach Line 3 with some vertex v that is adjacent to vu’ in 
T. At this point, v’ would be added to Fhext, and it would be added 
to w in the next execution through the while loop since F' 4 @ the 
next time. Thus, every vertex of T must be enumerated in w. This 
completes the proof. 


Remark 5.6. BFS can be modified for directed trees in an obvious 
way. Necessarily, we need vp to be strongly connected to every other 
vertex in order to ensure that BFS enumerates every possible vertex. 


Remark 5.7. Another algorithm for enumerating the vertices of a 
tree is the depth-first search (DFS) algorithm. This algorithm works 
by descending into the tree as deeply as possible (until a leaf is iden- 
tified) and then working back up. We present DFS as a recursive 
algorithm in Algorithm 2. A recursive algorithm is an algorithm that 
calls itself during its own execution. 


Depth-First Search on a Tree 

Input: T = (V, £) a tree, vo a starting vertex 

Initialize: vnow = vo {the current vertex} 

Initialize: w = (vo) {the sequence of next vertices to enumerate} 


(1) Recurse(T, Unow, w) 
Output: w {a depth-first sequence of vertices in T} 


Recurse 
Input: T = (V, £) a tree, vnow current vertex, w the sequence 


(1) for each v € N(vnow) do 
if v Z w then 

Append v to w 

Recurse(T’,, v, w) 

( end if 

(6) end for 


Algorithm 2: Depth-first search. 
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Fig. 5.2 The depth-first walk of a tree explores the tree in an ever-deepening 
pattern. 


Example 5.8. Figure 5.2 shows the order the vertices are added to 
w during a DFS of the tree. We start at Vertex 1 and then explore 
down to Vertex 3 before stopping because Vertex 3 is a leaf. We then 
go back to Vertex 2 and explore from there down to Vertex 4 (which 
is a leaf). We then return back to Vertex 1 and explore down from 
there to Vertex 5. 


Proposition 5.9. A DFS of a tree T = (V,E) enumerates all ver- 
tices in w. 


Remark 5.10. We note that BFS and DFS can be trivially mod- 
ified to search through connected graph structures and construct 
spanning trees for these graphs. We also note that BFS and DFS can 
be modified to function on directed trees (and graphs) and that all 
vertices will be enumerated provided that every vertex is reachable 
by a directed path from vo. 


Remark 5.11. In terms of implementation, we note that the recur- 
sive implementation of DFS works on most computing systems, pro- 
vided the graph in question has a longest path of at most some 
specified value. This is because most operating systems prevent a 
recursive algorithm from making any more than a specified number 
of recursion calls. 


5.2 Building a BFS/DFS Spanning Tree 


Remark 5.12. We can also build a spanning tree using a BFS or 
DFS on a graph. These algorithms are shown in Algorithms 3 and 4. 
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Note that instead of just appending vertices to w, we also grow a 
tree that will eventually span the input graphs G (just in case G is 
connected). 


Breadth-First Search Spanning Tree 

Input: G = (V, E) a graph, vo a starting vertex 

Initialize: F = (vo) {a sequence of vertices to enumerate} 
Initialize: Fyext = () {the sequence of next vertices to enumerate} 
Initialize: w = () {the sequence of vertices traversed} 

Initialize: T = (V, E’) {the tree returned} 


(1) while F 4 0 
for each v € F do 
Remove v from F 
Append v to w 
for each v’ € N(v) do 
if vu’ gw then 
Append v’ to Fhrext 
Add {v,v'} to E’ 
end if 
end for 
end for 
F = Frext 
Foes’ = 0) 
14) end while 


Output: T {a breadth-first spanning tree of G} 


Algorithm 3: Breadth-first search spanning tree. 


Depth-First Search Spanning Tree 

Input: G = (V, E) a graph, vo a starting vertex 
Initialize: vnow = vo {the current vertex} 

Initialize: w = (vo) {the sequence of vertices enumerated} 
Initialize: T = (V, EB’) {the tree to return} 


(1) Recurse(G, T, vnow, w) 
Output: T {a depth-first spanning tree of G} 


Recurse 
Input: G = (V, E) a graph, T = (V, B’) a tree, Unow current vertex, w the sequence 


(1) for each v € N(vnow) do 
if v Z w then 
Append v to w 
Add {vunow, v} to E’ 
Recurse(T, v, w) 
end if 
(7) end for 


Algorithm 4: Depth-first search spanning tree. 


Example 5.13. We illustrate the breadth-first and depth-first span- 
ning tree constructions in Fig. 5.3. In the BFS, we start at a vertex 
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Step 1 Step 2 Step 3 Output 
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Output Step 5 Step 4 
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Fig. 5.3. (a) The construction of a breadth-first spanning tree is illustrated. 
(b) The construction of a depth-first spanning tree is illustrated. 


and add it to a growing tree. We then add the vertices and edges that 
are adjacent to this graph, keeping track of the vertices just added. 
We repeat this process with the vertices just added, being careful not 
to repeat any vertices as this would create a cycle. 

The DFS approach to building a spanning tree works similarly, 
except now we continue exploring along a path (adding vertices and 
edges as we go) until we can explore no further because we have hit 
a vertex with a degree of one or all the neighbors of that vertex have 
already been added to the tree. We then backtrack until we find a 
new direction to explore. 
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Note that the two approaches generate different spanning trees. 
This is not surprising since their search pattern is very different. 
Whether a DFS or BFS is used in exploring a graph depends on the 
situation. 


Remark 5.14. Building a breadth- (or depth-) first spanning is a 
straightforward algorithmic way to check whether a graph is con- 
nected. In Chapter 6, we use a BFS as part of our analysis of flows 
on graphs. 


5.3. Prim’s Spanning Tree Algorithm 


Definition 5.15 (Weighted graph). A weighted graph is a pair 
(G,w) where G = (V,£) is a graph and w: E > R is a weight 
function. 


Remark 5.16. A weighted digraph is defined analogously. 


Example 5.17. Consider the graph shown in Fig. 5.4. A weighted 
graph is simply a graph with a real number (the weight) assigned to 
each edge. Weighted graphs arise in several instances, such as travel 
planning and communications, as well as in graph-based data science. 


Remark 5.18. Any graph can be thought of as a weighted graph 
in which we assign a weight of 1 to each edge. The distance between 
two vertices in a graph can then easily be generalized in a weighted 
graph. If p = (v1, €1,V2,---,;Un; Cn; Un+1) is a path, then the weight 


Fig. 5.4 A weighted graph is simply a graph with a real number (the weight) 
assigned to each edge. 
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of the path is 


n 


> w(ei): 


4=1 


Thus, in a weighted graph, the distance between two vertices v; and 
v2 is the weight of the least weight path connecting v; and v2. We 
study the problem of finding this distance in Section 5.6. 


Definition 5.19 (Graph weight). Let (G,w) bea weighted graph 
with G = (V, £). If H = (V’, E’) is a subgraph of G, then the weight 
of H is 


w(H) = > w(e). 


ech’ 


Definition 5.20 (Minimum spanning forest problem). Let 
(G,w) be a weighted graph with G = (V,E). The minimum span- 
ning forest problem for G is to find a forest F = (V’, E’) that is a 
spanning subgraph of G that has the smallest possible weight. 


Remark 5.21. If (G,w) is a weighted graph and G is connected, 
then the minimum spanning forest problem becomes the minimum 
spanning tree (MST) problem. 


Example 5.22. An MST for the weighted graph shown in Fig. 5.4 
is shown in Fig. 5.5. In the MST problem, we attempt to find a span- 
ning subgraph of a graph G that is a tree and has minimal weight 
(among all spanning trees). We verify that the proposed spanning 
tree is minimal when we derive algorithms for constructing a mini- 
mum spanning forest. 


Remark 5.23. The following algorithm, commonly called Prim’s 
algorithm [42], constructs an MST for a connected graph. 


Example 5.24. We illustrate the successive steps of Prim’s algo- 
rithm in Fig. 5.6. At the start, we initialize our set V’ = {1} and the 
edge set is empty. At each successive iteration, we add an edge that 
connects a vertex in V’ with a vertex not in V’ that has minimum 
weight. Going from Iteration 1 to Iteration 2, we could have chosen 
to add either edge {1,3} or edge {4,6} (the order doesn’t matter), 
so any tie-breaking method will suffice. We continue adding edges 
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Fig. 5.5 In the minimum spanning tree problem, we attempt to find a spanning 
subgraph of a graph G that is a tree and has minimal weight (among all spanning 
trees). 


Prim’s Algorithm 

Input: (G,w) a weighted connected graph with G = (V, E), vo a starting vertex 
Initialize: EF’ = 0 {the edge set of the spanning tree} 

Initialize: V’ = {vo} {new vertices added to the spanning tree} 


(1) while V’ AV 
Set X:=V\V! 
Choose edge e = {v, v'} so (i) v € V’; (ii) v’! © X and: 


w(e) = 


w ({u, u'}) 


min 
weU,ul EX 


Set B’ = BE’ uU {e} 
(5) Set V’ =V'U {v'} 
(6) end while 


Output: T = (V’, E’) {T is an MST.} 


Algorithm 5: Prim’s algorithm. 


until all vertices in the original graph are in the spanning tree. Note 
that the output from Prim’s algorithm matches the MST shown in 
Fig. 5.5. 


Theorem 5.25. Let (G,w) be a weighted connected graph. Then, 
Prim’s algorithm returns an MST. 


Proof. We show by induction that at each iteration of Prim’s algo- 
rithm, the tree (V’, E’) is a sub-tree of the MST T of (G,w). If this 
is the case, then at the termination of the algorithm, (V’, EZ’) must 
be equal to the MST T. 

To establish the base case, note that at the first iteration, V’ = 
{vp} and E’ = 0); therefore, (V’, E’) must be a sub-tree of T, which 
is an MST of (G,w). Now, suppose that the statement is true for all 
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Iteration 5 (Stop) Iteration 3 


Fig. 5.6 Prim’s algorithm constructs a minimum spanning tree by successively 
adding edges to an acyclic subgraph until every vertex is inside the spanning tree. 
Edges with minimal weight are added at each iteration. 


iterations up to and including k, and let T, = (V’, E’) at iteration k. 
Suppose at iteration k +1, we add edge e = {v,v'} to T;, to obtain 
Tr41 = (U,F) with U = V'U {v'} and F = E’ = EU {e}. Suppose 
that 7T;,1 is not a sub-tree of 7’, then e is not an edge in T,, and 
thus, e must generate a cycle in T. On this cycle, there is some edge 
e’ = {u,u’} with u € V’ and w’ ¢ V’. At iteration k + 1, we must 
have considered adding this edge to E’. We know by the selection of 
e that w(e) < w(e’). We have two cases: 


Case 1: If w(e) < w(e’), then if we construct T’ from T by removing 
e’ and adding e, we know that T’ must span G (this is illustrated in 
Fig. 5.7) and w(T") < w(T); thus, T was not an MST of G, which 
contradicts our assumption about 7’. So, this case cannot occur. 


Case 2: If w(e) = w(e’), then again constructing T’ from T by 
removing e’ and adding e, we obtain a second MST 7”, and Ty, is 
a sub-tree of that MST. 

Therefore, 7,4, must be a sub-tree of at least one MST. The 
result follows by induction. 
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Fig. 5.7 When we remove an edge (e’) from a spanning tree we disconnect the 


tree into two components. By adding a new edge (e) that connects vertices in these 
two distinct components, we reconnect the tree, and it is still a spanning tree. 


5.4 Computational Complexity of Prim’s Algorithm 


Remark 5.26. In this section, we discuss computational complex- 
ity. This is a subject that has its own course in many computer sci- 
ence departments (and some math departments). Therefore, we can 
only scratch the surface of this fascinating topic, and we will not be 
able to provide completely formal definitions for all concepts. When 
definitions are informal, they will occur in remarks rather than in 
definition blocks. 


Definition 5.27 (Big-O). Let f,g : R > R. The function f(z) is 
in the family O(g(x)) if there is an N € R and ac € R so that for 
all x > N, |f(«)| < clg(2)|. 


Remark 5.28. We have the following informal definition of algo- 
rithmic running time. The running time of an algorithm is the count 
of the number of steps an algorithm takes from the time we begin 
executing it to the time we obtain the output. We must be sure to 
include each time through any loops in the algorithm. This is not 
to be confused with the wall clock running time of an algorithm, 
which is dependent on the processor (a computer, a human, etc.) as 
well as the algorithmic details. Again, a more formal definition for 
algorithmic running time is given in Ref. [41]. 


Remark 5.29. In computing algorithmic running time, we need to 
be very careful about how we interpret the steps in the algorithm. For 
example, Prim’s algorithm uses the word “Choose” in Line 3. But for 
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a computer, this involves an enumeration of all the edges that might 
be connected to a specific vertex. Algorithm 6 does just that. In Lines 
6-11, we enumerate over all possible edges and vertices that can be 
added to the spanning tree and find an edge with minimal weight 
and add it to the tree. 


Prim’s Algorithm (Explicit Form) 

Input: (G,w) a weighted connected graph with G = (V, £), vo a starting vertex 
Initialize: E’ = @ {the edge set of the spanning tree} 

Initialize: V’ = {vo} {new vertices added to the spanning tree} 


(1) while V’ AV 
( Set X:=V\V’ 
Sete:=0@ 
Set w* = oo 
for each v € V’ 
for each v’ € X 
if {v,v'} € E and w({v,v'}) < w* 
w* = w({v,v'} 
e:= {v,v'} 
end if 
end for 
end for 
Set BE’ = E’uU {e} 
Set V’=V'U{v'} 
15) end while 


Output: T = (V’, E’) {T is a minimum spanning tree. } 


Algorithm 6: Prim’s algorithm (explicit form). 


Remark 5.30. Algorithm 6 is not optimal. It is intentionally not 
optimal so that we can compute its complexity in Theorem 5.31 
easily and we do not have to appeal to special data structures. See 
Remark 5.32 for more on this. 


Theorem 5.31. The running time of Algorithm 6 is O(|V|°). 


Proof. Consider the steps in the while loop starting at Line 1. If 
there are n vertices, then at iteration k of the while loop, we know 
that |V’| = k& and |X| = n—k since we add one new vertex to 
V’ at each while loop iteration (and thus remove one vertex from 
X at each while loop iteration). The for loop beginning at Line 5 
will have k iterations, and the for loop starting at Line 7 will have 
n—k iterations. This means that for any iteration of the while loop, 
we perform O(k(n — k)) operations. Thus, for the whole algorithm, 
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we perform 


operations. Thus, the running time for Algorithm 6 is O(n?) = 
O(\V|*). 


Remark 5.32. As it turns out, the implementation of Prim’s 
algorithm can have a substantial impact on the running time. 
There are implementations of Prim’s algorithm that run in O(|V]?), 
O(|E| log(|V|)), and O(|E| +|V] log(|V]|)) [28]. Thus, we cannot just 
say that the Prim’s algorithm is an O(g(x)) algorithm; we must know 
which implementation of Prim’s algorithm we are using. Clearly, the 
implementation in Algorithm 6 is not a very good one. Reference [28] 
has complete details on better implementations of Prim’s algorithm 
using special ways of storing the data used in the algorithm. 


Definition 5.33 (Polynomial running time). For a_ specific 
implementation of an algorithm, its running time is polynomial if 
there is some polynomial p(x) so that when the running time of the 
algorithm is f(x), then f(x) € O(p(a)). 


Theorem 5.34. There is an implementation of Prim’s algorithm 
that is polynomial. 


5.5 Kruskal’s Algorithm 


Remark 5.35. In this section, we discuss Kruskal’s algorithm [43], 
which is an alternative way to construct an MST of a weighted graph 
(G,w). The algorithm is shown in Algorithm 7. 


Example 5.36. We illustrate Kruskal’s algorithm in Fig. 5.8. The 
spanning subgraph starts with each vertex in the graph and no edges. 
In each iteration, we add the edge with the lowest edge weight, pro- 
vided that it does not cause a cycle to emerge in the existing sub- 
graph. It is purely coincidental that the construction of the spanning 
tree occurs in exactly the same set of steps as the Prim’s algorithm. 
This is not always the case. In this example, edges that cause a cycle 
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Kruskal’s Algorithm 

Input: (G, w) a weighted connected graph with G = (V, E) and n = |V| 

Initialize: Q = E 

Initialize: V’ = V 

Initialize: E’ = 0 

Initialize: For all v € V define C(v) := {v} {C(wv) is the set of vertices with distance 
from v less than infinity at each iteration.} 


while |Q| > 0 
Choose the edge e = (v, v’) in Q with minimum weight. 
if C(v) 4 C(v’) 
for each u € C(v): C(u) := C(u) U C(v’) 
for each u € C(v’): C(u) := C(u) U C(v) 
E’' := E'U {e} 
Q:= Q\ {e} 
else 
Q:=Q\{e} 
GOTO 2 
end if 
end while 


Output: T = (V’, E’) {T is a minimum spanning tree. } 


Algorithm 7: Kruskal’s algorithm. 
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Fig. 5.8 Kruskal’s algorithm constructs an MST by successively adding edges 
and maintaining an acyclic disconnected subgraph containing every vertex until 
that subgraph contains n — 1 edges, at which point we are sure it is a tree. Edges 
with minimal weight are added at each iteration. 
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are not chosen until the MST has been constructed. Also note that 
we have stopped the algorithm when the MST has been constructed. 
Strictly following Algorithm 7 would mean we should check the 
remaining edges (and thus remove them from Q). 


Remark 5.37. We prove the following theorem in the last section 
of this chapter using a very generalized method. It can be shown by 
induction, just as we did in Theorem 5.25. 


Theorem 5.38. Let (G,w) be a weighted connected graph. Then, 
Kruskal’s algorithm returns an MST. 


Remark 5.39. Reference [28] contains a proof of the following the- 
orem and shows how to use special data structures for the efficient 
implementation of Kruskal’s algorithm. 


Theorem 5.40. There is an implementation of Kruskal’s algorithm 
whose running time is O (|E|log(|V])). 


5.6 Shortest Path Problem in a Positively 
Weighted Graph 


Remark 5.41. The shortest path problem in a weighted graph is the 
problem of finding the least weight path connecting a given vertex v 
to a given vertex vu’. Dijkstra’s algorithm [44] answers this question 
by growing a spanning tree starting at v so that the unique path from 
v to any other vertex v’ in the tree is the shortest. The algorithm 
is shown in Algorithm 8. It is worth noting that this algorithm only 
works when the weights in the graph are all positive. We discuss 
Floyd’s algorithm [45], which solves this problem, later in the chapter. 


Example 5.42. An example execution of Dijkstra’s algorithm is 
shown in Fig. 5.9. At the start of the algorithm, we have Vertex 
1 (vo) as the vertex in the set Q that is closest to vo (it has a dis- 
tance of 0, obviously). Investigating its neighbor set, we identify three 
vertices 2, 3, and 4, and the path length from Vertex 1 to each of 
these vertices is smaller than the initialized distance of oo, so these 
vertices are assigned Vertex 1 as a parent, denoted by p(v), and the 
new distances are recorded. Vertex 1 is then removed from the set Q. 
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Dijkstra’s Algorithm 

Input: (G, w) a weighted connected graph with G = (V, E), vo an initial vertex. 
Initialize: Q := V 

Initialize: For all v € V if v ¥ vo define d(vo,v) := co otherwise define d(vo,v) := 0 
{d(vo,v) is the best distance from vo to v.} 

Initialize: For all v € V, p(v) := undefined {A “parent” function that will be used to 
build the tree. } 


while Q 4 0 
Choose v € Q so that d(vo,v) is minimal 
Q:=Q\ {fo} 
for each v’ © N(v 
Define 5(vo, v’) := d(vo, v) + w({v, v’}) 


} 


if 5(vo,v’) < d(vo,v’) 
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end while 
Set V’ :=V 
Set E’ :=0 
for each v € V 
if p(v) 4 undefined 
E’ := B' U{v, p(v)} 
end if 
(18) end for 
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Output: T = (V’, E’) and d(vo,-) {T is a Dijkstra tree, d(vo,-) provides the distances. } 


Algorithm 8: Dijkstra’s algorithm (adapted from Wikipedia’s pseu- 
docode, http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm). 


In the second iteration, we see that Vertex 3 is closest to ug 
(Vertex 1), and investigating its neighborhood, we see that the total 
distance from Vertex 1 to Vertex 3 and then from Vertex 3 to Ver- 
tex 4 is 9, which is smaller than the currently recorded distance of 
Vertex 1 to Vertex 4. Thus, we update the parent function of Vertex 
4 so that it returns Vertex 3 instead of Vertex 1. We also update the 
distance function and continue to the next iteration. The next clos- 
est vertex to vp is Vertex 2. Investigating its neighbors shows that 
no changes need to be made to the distance or parent function. We 
continue in this way until all the vertices are exhausted. 


Theorem 5.43. Let (G,w) be a weighted connected graph with vertex 
ug. Then, Dijkstra’s algorithm returns a spanning tree T so that the 
distance from vo to any vertex v in T is the minimum distance from 
vg to v in (G,w). 


Proof. We proceed by induction to show that the distances from 
vg to every vertex v € V \ Q are correct when v is removed from Q. 
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Fig. 5.9 Dijkstra’s algorithm iteratively builds a tree of shortest paths from a 
given vertex vo in a graph. Dijkstra’s algorithm can correct itself, as we see from 
Iterations 2 and 3. 


To do this, define X = V \ Q, and let T;, be the tree generated by 
the vertices in X and the function p(v). 

In the base case, when vp is removed from Q (added to X), it is 
clear that d(vo,vo) = 0 is correct. Now, assume that the statement 
is true up to the kth iteration so that for any vertex v’ added to X 
prior to the kth iteration, d(vo,v’) is correct and the unique path in 
Ty from vp to v’ defined by the function p(v) is the minimum distance 
path from vp to v’ in (G,w). 

Before proceeding, note that for any vertex v’ added to X at 
iteration k, p(v’) is fixed permanently after that iteration. Thus, the 
path from vp to v’ in Ty is the same as the path from vo to v’ in Th41. 
Therefore, assuming d(vp,v’) and p(v) to be correct at iteration k 
implies that they must be correct at all future iterations or, more 
generally, that they are correct for (G,w). 

Suppose vertex v is added to set X (removed from Q) at iteration 
k +1, but the shortest path from vg to v is not the unique path from 
vg to v in the tree 7,41 constructed from the vertices in X and the 
function p(v). Since G is connected, there is a shortest path, and 
we now have the following two possibilities: (i) The shortest path 
connecting vg to uv passes through a vertex not in X; or (ii) the 
shortest path connecting vg to v passes through only vertices in X. 
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In the first case, if the true shortest path connecting vp to v passes 
through a vertex u not in_X, then we have the following two new pos- 
sibilities: (a) d(vo, u) = 00; or (b) d(vp, u) = 7 < co. We may dismiss 
the first case as infeasible; thus, we have d(vo,u) = Tr < co. In order 
for the distance from vg to v to be less along the path containing w, 
we know that d(vo,u) < d(vo, v). But if that’s true, then in Step 2 of 
Algorithm 8, we should have evaluated the neighborhood of u well 
before evaluating the neighborhood of v in the for loop starting at 
Line 4, and thus, u must be an element of X (i.e., not in Q). This 
contradicts our assumption and leads to the second case. 

Suppose now that the true shortest path from vp to v leads to a 
vertex uv" before reaching v while the path recorded in T,41 reaches 
uv’ before reaching v, as illustrated in the following figure. 


i) 


2) 
iste ars ae UW 
Ignore this path, it contains 
vertices not in X 
Let wy = w(v',v) and wo = w(v",v). Then, it follows that 


d(vg,v') + w, > d(vo, uv”) + we. By the induction hypothesis, d(vo, v’) 
and d(vo,v") are both correct as is their path in 7,41. However, 
since both v’ and v” are in X, we know that the neighborhoods of 
both these vertices were evaluated in the for loop at Line 4. If when 
N(v’), we had p(v) = v”, then this would still be the case at iter- 
ation k + 1 since Line 6 specifically forbids changes to p(v) unless 
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d(vp,v’) + wy < d(vo, v") + we. On the other hand, if p(v) = v’, when 
N(v") was evaluated, then it’s clear at once that p(v) = v” at the 
end of the evaluation of the if statement at Line 6 and would con- 
tinue to be so at iteration k + 1. This contradicts our assumption 
on the structure of T,,1. In either case, d(vo,v) could not be incor- 
rect in 7;,,, and the correctness of Dijkstra’s algorithm follows from 
induction. 


Remark 5.44. A proof of the following theorem can be found in 
Ref. [28], which discusses special data structures for the implemen- 
tation of Dijkstra’s algorithm. 


Theorem 5.45. There is an implementation of Dijkstra’s algorithm 
that has a running time of O (|E| + |V|log(|V])). 


Remark 5.46. Dijkstra’s algorithm is an example of a dynamic pro- 
gramming [46] approach to finding the shortest path in a graph. 
Dynamic programming is a sub-discipline of mathematical program- 
ming (or optimization), which we will encounter in the coming 
chapters. 


5.7 Floyd—Warshall Algorithm 


Remark 5.47. Dijkstra’s algorithm is an efficient algorithm for 
graphs with non-negative weights. However, Dijkstra’s algorithm can 
yield results that are incorrect when working with graphs with nega- 
tive edge weights. To see this, consider the graph shown in Fig. 5.10. 


Fig. 5.10 This graph has negative edge weights that lead to confusion in 
Dijkstra’s algorithm. 
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Executing Dijkstra’s algorithm from Vertex 1 leads to the following 
data: 


(1) At initialization, Q = {1,2,3,4} and d(1,1)=0 while d(1,v) =co 
for v € {2,3,4}. More importantly, p(1) = p(2) = p(3) = p(4) 
undefined. (Recall that p(-) is the parent function used to build 
the Dijkstra tree.) 

(2) At the first step, Vertex 1 is removed from Q, and we examine its 
neighbors. During this step, we define: d(1,1) = 0, d(1,2) = —2, 
d(1,3) = 3, and d(1,4) = oo as well as p(1) = undefined, 
p(2) = p(3) = 1, and p(4) = undefined. Now, Q = {2,3, 4}. 

(3) At the second stage, Vertex 2 is the vertex closest to Vertex 1, 
so it is removed and we compute on its neighbors. We see that 
d(1,1) = —4, d(1,2) = —2, d(1,3) = —3, and d(1,4) = 0 as well 
as p(1) = 2, p(2) = 1, p(3) = 2, and p(4) = 2. Clearly, we have 
a problem already since p(1) = 2 and p(2) = 1, which means to 
get to vertex 1, we go through vertex 2 and vice versa. At this 
stage, Q = {3, 4}. 

(4) We continue by removing Vertex 3 from Q and computing on its 
neighbors. We now have d(1, 1) = —4, d(1,2) = —4, d(1,3) = —3, 
and d(1,4) = —5 as well as p(1) = 2 p(2) = 3, p(3) = 2, and 
p(4) = 3. 

(5) Completing the algorithm and computing on the neighbors of 4 
yields: d(1,1) = —4, d(1,2) = —4, d(1,3) = —7, and d(1,4) = —5 
as well as p(1) = 2, p(2) = 3, p(3) = 4, and p(4) = 3. The 
resulting parent function cannot define a proper tree structure, 
and the algorithm fails. 


These steps are illustrated in Fig. 5.11. 


Remark 5.48. The real problem with Dijkstra’s algorithm and neg- 
ative edge weights is the fact that sequences of edges are repeating 
whose weights are negative. For example, going from Vertex 1 to 
Vertex 2 and back to Vertex 1 creates a lower-weight path than not 
leaving Vertex 1 at all. The result is a walk rather than a path. Ona 
directed graph, this problem may not be as obvious, but the presence 
of a directed cycle with negative total length will cause problems. 
This is illustrated in Fig. 5.12. In these graphs, there is no shortest 
walk at all and the shortest length path (sometimes called a simple 
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Q = {1, 2,3, 4} Q = {2,3,4} Q = {3, 4} 


Initialization Iteration 1 Iteration 2 


No solution... < 


<_——_ Parent Function Q — {} Q = {4} 


Iteration 4 Iteration 3 


Fig. 5.11 The steps of Dijkstra’s algorithm run on the graph in Fig. 5.10. 


Fig. 5.12 A negative cycle in a (directed) graph implies that there is no shortest 
path between any two vertices, as repeatedly going around the cycle will make 
the path smaller and smaller. 


path) can be very hard to find. (The problem is NP-hard, a type of 
problem we discuss in Chapter 7). 


Remark 5.49. The problem of computing with negative edge 
weights can be solved through the Floyd—Warshall algorithm. This 
algorithm assumes a directed graph as input. The Floyd—Warshall 
algorithm for a directed graph is shown in Algorithm 9. 
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Floyd—Warshall Algorithm 

Input: (G,w) a (directed) weighted connected graph with G = (V,£), vo an initial 
vertex, uy a final vertex 

Initialize: For all (u,v) € V x V if e = (u,v) € E, then d(u,v) := w(e); otherwise, if 
u =v, d(u,v) := 0 otherwise, d(u, v) := oo. {Here, d(u,v) is the shortest distance from 
u to v.} 

Initialize: For all (u,v) € V x V, if e = (u,v) € EF, then n(u,v) := v; otherwise, 
n(u,v) := undefined. {The function n(u, v) is the next vertex to move to when traversing 
from u to v along an optimal path.} 

Assume: V = {v1 


(1) for each i € {1 
for each wu; € V 
for each u2 € V 
if d(u1, vi) + d(vj, u2) < d(u1, u2) 
d(ui,u2) := d(u1, vi) + d(vi, u2) {Update the distance. } 
n(u1, U2) := n(u1,v;) {Update the next step to go through v;.} 
end for 
end for 
end for 
for each v € V {Check for negative cycles. } 
if d(v,v) <0 
RETURN NULL 
end if 
end for 
Set E’ :=0 
Set V’ :=0 
if n(vo, v7) A undefined 
u := vO 
while u F vy 
BE! := EB’ U(u, n(u, v¢)) 
V’ :=V'U{u} 
end while 
V' =V’'U {vz} {Add the last step in the path.} 
end if 


Output: P = (V’,E’) and d(-,-) {P is a Floyd—Warshall path from vo to vy, d(-,-) 
provides the distances. } 


Algorithm 9: Floyd—Warshall algorithm (adapted from Wikipedia’s 
pseudocode, https://en.wikipedia.org/wiki/Floyd%E2%80%93 War 
shall_algorithm). This algorithm finds the shortest path between two 
vertices in a graph with (possibly) negative edge weights. 


Remark 5.50. The Bellman—Ford algorithm is an older method that 
can also be used to solve this problem. It is less efficient than Dijk- 
stra’s algorithm. Johnson’s algorithm combines the Bellman—Ford 
algorithm with Dijkstra’s algorithm to solve problems with negative 
edge weights as well. The Bellman—Ford algorithm is used frequently 
in computer science because it has a variant that is distributed, called 
the distributed Bellman—Ford algorithm, which is used in developing 
computer networks. 


Example 5.51. We illustrate the Floyd—Warshall algorithm on the 
graph shown in Fig. 5.13. 
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Fig. 5.13 <A directed graph with negative edge weights. 


Initially, the distance function is defined only for edges in the 
graph and is set to zero for the distance from a vertex to itself. Thus, 
we know that 


(1) dogs tp) =O for k= 1,2, 3,4. 

(2) d(v1, v2) = —2, d(v1, v3) = 3, d(v1, v4) = 00. 

(3) d(vg, v1) = 00, d(v2,u3) = —1, d(v2, v4) = 2. 

(4) d(v3, v1) = 00, d(v3, v2) = 00, d(ug,u4) = —2. 
(5) d(va, v1) = 00, d(va, v2) = 00, d(v4,v3) = 00. 


There are four vertices in this example, so the outer loop will be 
executed four times. There will be a total of 64 comparisons at Line 4, 
and we cannot summarize them all. Instead, we discuss when the 
distance function changes. 


Outer loop with vy: In the outer loop with v1, we are interested 
in paths that use v;. Since vy has no in-edges, there are no 
paths that can be made shorter by passing through v,. Thus, 
no change to the distance function is made. 

Outer loop with v2: In this loop, two things happen: 


(1) When uz = v1 and ug = vs, the distance d(v,,v3) is updated to 
—3 since there is a path of length —3 from v, through v2 to v3. 

(2) When uw, = v; and ug = v4, the distance d(v1, v4) is updated to 0 
since there is a path of length 0 from v; through v2 to v4. (Before 
this, the distance from v; to v4 was infinite.) 
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Outer loop with vgs: In this loop, two things happen: 


(1) When uw, = v1 and ug = v4, the distance from v1 to v4 is updated 
to —5 since there is a path of length —5 going through v3 con- 
necting vz to v4. 

(2) When u, = v2 and ug = v4, the distance from v2 to v4 is updated 
to —3. 


Outer loop with v4: In this loop, no further distance improve- 
ments can be made. 


The complete distance function has the form: 


Cl} dtp.) =O for k= 1:2, 3,4. 

(2) d(v1, v2) = —2, d(v1, v3) = —3, d(v1, v4) = —5. 
(3) d(v2, v1) = 00, d(v2,u3) = —1, d(v2, v4) = —3. 
(4) d(v3, v1) = 00, d(u3, v2) = 00, d(ug, v4) = —2. 

(5) d(v4, v1) = 00, d(vu4, v2) = co, d(v4, v3) = co. 


Theorem 5.52. The Floyd—Warshall algorithm is correct. That is, 
the path returned connecting vo to vz, the given initial and final ver- 
tices, is the shortest possible in the graph assuming no negative cycles 
exist. 


Proof. The proof is inductive on the outer for loop of the Floyd— 
Warshall algorithm. As in the algorithm statement, assume that we 
are provided a weighted directed graph (G,w), with G = (V, E) and 
V= ee oy be 

To execute this proof, we need an auxiliary function: Let uw, and 
ug be vertices in the graph, and let Vk = {v1,...,u~%} C V. Let 
dy,(u1, U2) be a function that returns the (shortest) distance between 
uz and uy using only the vertices in V, as intermediary steps; that 
is, dy(u, v) is computed on the graph spanned by Vj, ui, and ue. 

At the start of the algorithm (the base case), we have not executed 
the outermost for loop. For any pair of vertices u; and ug, clearly 
do(ui, uz) returns the shortest path considering only the vertices wu, 
and ug. Thus, do(ui, uz) is equivalent to the function d(-,-) in the 
Floyd—Warshall algorithm after initialization. 

Now, assume that after & iterations of the outermost for loop, 
d(-,-) defined in the Floyd—Warshall algorithm is identical to d,(-,-). 
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We show that after the k + 1st iteration, d(-,-) defined in the Floyd— 
Warshall algorithm is identical to dx+1(-,-). To see this, note that at 
Line 4, we determine whether d(u1, vp41) + d(vgs1, U2) < d(uy, ua); 
that is, we determine whether it is more expeditious to reach ug 
from uy, via vz41. If not, we do nothing. If so, we update the function 
d(-,-) to use this more expeditious path. Since d(-,-) was equivalent 
to d;(-,-) by the induction hypothesis, it’s clear that after the k + 
lst iteration of the outermost for loop, d(-,-) must be equivalent to 
dp+1(-,:) by the construction that takes place at Line 5. 

This induction must terminate after n steps, at which point we 
must have d(-,-) = d,(-,-). But this implies that the distances con- 
structed from the Floyd—Warshall algorithm are correct since d,,(-, -) 
is the true graph distance function. The construction of an optimal 
path from vp to vg is ensured since n(-,-) respects d(-,-), which we 
just proved correct at the algorithm termination. 


Example 5.53 (Application of negative cycle detection). 
Suppose that we have n currencies with exchange rate rj; when 
going from Currency i to Currency j. Imagine a scenario in which 
we start with $1 of Currency 1, and for some k < n, we have 
11,272,3°**Tk-1,kk,1 > 1. Then, exchanging Currency 1 for Currency 
2, Currency 2 for Currency 3, etc., will ultimately allow us to obtain 
more than $1 of Currency 1. This is called currency arbitrage. If we 
assume that we have a digraph G with vertices V = {v1,...,Un} 
and with a directed edge from (v;,v;) for all pairs (i,j) with i ¥ j, 
then the transformation from Currency 1 to Currency 2 to Currency 
3...to Currency & and back to Currency 1 corresponds to a cycle in 
G. This is illustrated in Fig. 5.14. Let w = (v1, €1, v2,---, Un; €n; Un+1) 
be a directed walk in the currency graph. Let ry be the effective 
exchange rate from currency v; to Upj+1 so that 


Tw = Tovy,ve * Tv2,03 °° Tunvn4i: 


Note that 


log(rw) = log (Tv1,v2) 7 log (Tv2,0s) ote ap log (Tun junsi) (5.1) 


Furthermore, if ry,,., <1, then log(ry,,.;) <0; this occurs when one 
unit of currency i is worth less than one unit of currency 7. In finding 
currency arbitrage, we are interested in finding cycles with ry > 1, 
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Fig. 5.14 A currency graph showing the possible exchanges. Cycles correspond 
to the process of going from one currency to another to another and ultimately 
ending up with the starting currency. 


given an edge (1, v;) of weight —log(ry,,). We can compute the 
exchange rate of a walk, path, or cycle by adding the negative log 
weights. Thus, if there is a cycle c = (v4, €1,V2,---,;Un,;€n,V1) with 
the property that r. > 1, then there is a negative weight cycle in the 
currency graph with edge weights — log(ry,,v,). Thus, we can use the 
Floyd—Warshall algorithm to detect currency arbitrage. 


5.8 Greedy Algorithms and Matroids 


Definition 5.54 (Power set). Let S be a set, then 2° is the power 
set of S; that is, 2° is the set of all subsets of S. 


Example 5.55. Consider the simple set S = {1,2}. Then, the power 
set is 


r= {0, age 1245 115 2} } 
Here, @ is the empty set, and it is a subset of every set. 


Definition 5.56 (Hereditary system). A hereditary system is a 
pair (E,Z) so that E is a finite set and Z C 2” a nonempty set of 
independent sets so that if A€ Z and BC A, then Be TZ. 


Remark 5.57. Any subset of & that is not in Z is called a dependent 
set. 
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Proposition 5.58. Suppose G = (V,F) is a graph, and let T be the 
set of subsets of E such that if E’ € T, then the subgraph of G induced 
by E’ is acyclic (e.g., a sub-forest of G). Then, (E,Z) is a hereditary 
system. 


Definition 5.59 (Weighted hereditary system). A weighted 
hereditary system is a triple (E,Z,w) so that (F,Z) is a hereditary 
system and w: E > R is a weight function on the elements of EF. 


Example 5.60. In light of Proposition 5.58, we could think of a 
weighted graph (G,w) with G = (V, £) as giving rise to a weighted 
hereditary system (F,Z,w) so that Z is the collection of edge subsets 
of EF that induce acyclic graphs and w is just the edge weighting. 


Definition 5.61 (Minimum weight problem). Let (£,Z,w) be 
a weighted hereditary system. Then, the minimum weight problem is 
to identify a set E’ € Z (an independent set) such that 


w(E’) = S$ w(e) (5.2) 


ecE’ 


is as small as possible and E’ is a maximal subset of EF (that is, there 
is no other set I € Z so that E’ C I). 


Remark 5.62. One can define a maximum weight problem in pre- 
cisely the same way if we replace the word minimum with maximum 
and small with large. Algorithm 10 is called the greedy algorithm, and 
it can be used (in some cases) to solve the minimum weight problem. 
The name “greedy” makes a little more sense for maximum weight 
problems, but the examples we give are minimum weight problems. 


Remark 5.63. Let (G,w) be a weighted graph and consider the 
weighted hereditary system with (FZ), with Z the collection of edge 
subsets of E that induce acyclic graphs and where w is just the edge 
weighting. Kruskal’s algorithm is exactly a greedy algorithm. We 
begin with the complete set of edges and continue adding them to 
the forest (acyclic subgraph of a given weighted graph (G,w)), each 
time checking to make sure that the added edge does not induce a 
cycle (that is, that we have an element of Z). We use this fact shortly 
to prove Theorem 5.38. 
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Greedy Algorithm 

Input: (£,Z,w) a weighted hereditary system 
Initialize: E’ = 

Initialize: A = E 


(1) while A #0 
(2) 


Choose e € A to minimize w(e) 
A:=A\{e} 
if BE’ uUf{es}eT 
EB’ := BE’ Ufe} 
(6) end if 
(7) end while 


Output: EL’ 


Algorithm 10: Greedy algorithm (minimization). 


Definition 5.64 (Matroid). Let M = (E,Z) be a hereditary sys- 
tem. Then, M is a matroid if it satisfies the augmentation property: 
If I, J € Z and |I| < |J|, then there is some e € EF so that e € J and 
e € I and so that IU {e} € TZ. 


Remark 5.65. Definition 5.64 essentially says that if there are two 
independent sets (acyclic subgraphs are an example) and one has 
greater cardinality than the other, then there is some element (edge) 
that can be added to the independent set (acyclic graph) with smaller 
cardinality so that this new set is still independent (an acyclic graph). 


Theorem 5.66. Let (E,Z,w) be a weighted hereditary system. The 
structure M = (E,T) is a matroid if and only if the greedy algorithm 
solves the minimum weight problem associated with M. 


Proof. (=) Let J = {e1,...,en} be the set in Z identified by the 
greedy algorithm, and suppose that J = {fi,..., fm} is any other 
maximal element of Z. Without loss of generality, assume that 


Assume that |J| > ||; then, by the augmentation property, there is 
an element e € J and not in J such that J U {e} is in Z, but this 
element would have been identified during execution of the greedy 
algorithm. By a similar argument, |J| < || since, again by the 
augmentation property, we could find an element e € I so that 
JU {e} € TZ, thus J is not maximal. 
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Therefore, |J| = |J| or, more specifically, m = n. Assume that 
de = Abi ens peep ond de = Chien gy) tor he = 1c 8. (os Jy 
and J; each has one element, Jz and Jz each has two elements, etc.) 
It now suffices to show that if 


k 
w(Ix) = S-w(ei), 


i=1 


then w(I,) < w(J,) for all k = 1,...,n. We proceed by induction. 
Since the greedy algorithm selects the element e with the smallest 
weight first, it is clear that w(I,) < w(J1); thus, we have established 
the base case. Now, assume that the statement is true up through 
some arbitrary k < n. By definition, we know that |Jzp41| > [Iz]; 
therefore, by the augmentation property, there is some e € Jit 
with e ¢ I, so that I, U {e} is an element of Z. It follows that 
w(er41) < w(e) because otherwise, the greedy algorithm would have 
chosen e instead of e441. Furthermore, w(e) < w(fx+41) since the 
elements of J and J are listed in ascending order and e € Jz,11. Thus, 
w(er4i) < we) < w(fey1); therefore, we conclude that w(Ip41) < 
w(Jp41). The result follows by induction. 

(<) We proceed by contrapositive to prove that M is a matroid. 
Suppose that the augmentation property is not satisfied, and consider 
I and J in Z with |I| < |J| so that there is no element e € J with 
e ZI so that JU {e} is in Z. Without loss of generality, assume that 
|I| = |J| +1. Let |Z| =n and consider the following weight function 


—(n+2) ifeeT, 
we) =< -(n+1) ifeeJ\T, 


0 otherwise. 


After the greedy algorithm chooses all the elements of J, it can- 
not decrease the weight of the independent set because only ele- 
ments that are not in J will be added. Thus, the total weight will 
be —n(n + 2) = —n? — 2n. However, the set J has a weight of 

(n+1)(n +1) = —n? — 2n — 2. Thus, any independent set con- 
taining J has a weight of at most —n? — 2n — 2. Thus, the greedy 
algorithm cannot identify a maximal independent set with minimum 
weight when the augmentation property is not satisfied. Thus, by 
contrapositive, we have shown that if the greedy algorithm identifies 
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a maximal independent set with minimal weight, then MM must bea 
matroid. This completes the proof. 


Theorem 5.67. Let G = (V, E) be a graph. Then, the hereditary sys- 
tem M(G) = (E,Z), where T is the collection of subsets that induce 
acyclic graphs, is a matroid. 


Proof. From Proposition 5.58, we know that (F,7) is a hereditary 
system, we must simply show that it has the augmentation property. 
To see this, let J and J be two elements of Z with |I| < |J|. Let H 
be the subgraph of G induced from the edge sets JU J. Let F bea 
spanning forest of this subgraph H that contains J. We know from 
Corollary 4.17 that H has a spanning subgraph, and we know that 
we can construct such a graph using the technique from the proof of 
Theorem 4.16. 

Since J is acyclic, F’ has at least as many edges as J; therefore, 
there exists at least one edge e that is in forest F’ but that does 
not occur in the set J; furthermore, it must be an element of J 
(by construction of H). Since e is an edge in F’,, it follows that the 
subgraph induced by the set J U {e} is acyclic; therefore, I U {e} is 
an element of Z. Thus, M(G) has the augmentation property and is 
a matroid. 


Corollary 5.68 (Theorem 5.38). Let (G,w) be a weighted graph. 
Then, Kruskal’s algorithm returns an MST when G is connected. 


Remark 5.69. Matroid theory is a very active and deep area of 
research in combinatorics and combinatorial optimization theory. 
A complete study of this field is well outside the scope of this book. 
See Ref. [47] for complete details on the subject. 


5.9 Chapter Notes 


This chapter is heavy with topics that would be covered in a com- 
puter science text or in a class on algorithms. As such, most of 
the names that appear are associated more with computer science. 
Robert Prim (of Prim’s algorithm) was published in 1957 [48] but 
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was actually an independent rediscovery of work originally done by 
Vojtéch Jarnik [49]. Joseph Kruskal published his algorithm in 1956 
in the Proceedings of the American Mathematical Society [50]. Ironi- 
cally, Kruskal’s algorithm was rediscovered independently in 1957 by 
Loberman and Weinberger [51]. Dijkstra’s algorithm was originally 
published in 1959 [52], though conceived earlier [53]. The Floyd— 
Warshall algorithm is sometimes called the Roy—Floyd—Warshall 
algorithm, as it was discovered independently three times. Roy 
published first in 1959 [54] but in French. Floyd and Warshall both 
published independently in 1962 [55,56] but in completely different 
contexts. The extensive rediscovery is an indicator of how active dis- 
crete mathematics was in the context of computer science in the 
middle of the 20th century. 

Matroids were first developed by Hassler Whitney in 1935 [57] 
originally to study generalized notions of linear independence from 
linear algebra (which we will encounter later) but also to develop 
connections to graph theory (covered in this chapter). This work was 
independently developed by Takeo Nakasawa, but this was not known 
for some time [58]. Since then, the relationship between matroids 
(and their generalizations) and algebra, geometry, and graph theory 
has been thoroughly investigated. There is not enough room in this 
note to list all the contributors to this topic. James Oxley’s text [47] 
is a thorough introduction to matroids, though shorter introductions 
with some additional results not presented in this chapter can be 
found in Ref. [59]. Matroids (and their generalizations) are beautiful 
because they connect many areas of discrete math and geometry 
together, but open questions can be fiendishly complex. 


5.10 Exercises 


Exercise 5.1 
Prove Proposition 5.9. [Hint: The proof is almost identical to the 
proof for BFS.] 


Exercise 5.2 
Show that a breadth-first spanning tree returns a tree with the prop- 
erty that the walk from vp to any other vertex has the smallest length. 
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Exercise 5.3 
Use Prim’s algorithm to find an MST for the following graph. 


Exercise 5.4 
Modify Algorithm 5 so that it returns a minimum spanning forest 
when G is not connected. Prove that your algorithm works. 


Exercise 5.5 
Use Kruskal’s algorithm to determine an MST for the graph from 
Question 5.3. 


Exercise 5.6 

In the graph from Example 5.24, choose a starting vertex other than 
1 and execute Prim’s algorithm to show that Prim’s and Kruskal’s 
algorithms do not always add edges in the same order. 


Exercise 5.7 
Use Kruskal’s algorithm on the following graph. 


A A 
fi» f% 


Do you obtain a minimum spanning forest? 
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Exercise 5.8 

Experimentally compare the running time of an implementation of 
Kruskal’s algorithm O (|E|log(|V|)) to the running time of an imple- 
mentation of Prim’s algorithm O(|£| + |V|log(|V|)). Under what 
circumstances might you use each algorithm? [Hint: Suppose that 
G has n vertices. Think about what happens when |F]| is big (say 
n(n — 1)/2) versus when |E]| is small (say 0). Try plotting the two 
cases for various sizes of n.] 


Exercise 5.9 
Prove that if the edge weights of a graph are unique (i.e., no two 
edges share an edge weight), then that graph has only one MST. 


Exercise 5.10 

Use Dijkstra’s algorithm to grow a Dijkstra tree for the graph in 
Question 5.3, starting at vertex D. Find the distance from D to each 
vertex in the graph. 


Exercise 5.11 

(Project) The A* heuristic is a variation of Dijkstra’s algorithm, 
which, in the worst case, defaults to Dijkstra’s algorithm. It is fun- 
damental to the study of artificial intelligence. Investigate the A* 
heuristic, describe how it operates, and compare it to Dijkstra’s algo- 
rithm. Create two examples for the use of the A* heuristic, one that 
outperforms Dijkstra’s algorithm and the other that defaults to Dijk- 
stra’s algorithm. You do not have to code the algorithms, but you can. 


Exercise 5.12 

(Project) Using Ref. [28] (or some other book on algorithms), 
implement BFS and DFS (for generating a spanning tree), Prim’s, 
Kruskal’s, and Dijkstra’s algorithm in the language of your choice. 
Write code to generate a connected graph with an arbitrarily large 
number of vertices and edges. Empirically test the running time of 
your three algorithms to see how well the predicted running times 
match the actual running times as a function of the number of ver- 
tices and edges. Using these empirical results, decide whether your 
answer to Question 5.8 was correct or incorrect. 
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Exercise 5.13 

The fact that the weights are not negative is never explicitly stated 
in the proof of the correctness of Dijkstra’s algorithm, but it is used. 
Given this example, can you find the statement where it is critical 
that the weights be positive? 


Exercise 5.14 
Compute the running time of Steps 1-9 of Floyd’s algorithm. 


Exercise 5.15 
Prove Proposition 5.58. 


Chapter 6 


An Introduction to Network Flows 
and Combinatorial Optimization 


Remark 6.1 (Chapter goals). The goal of this chapter is to dis- 
cuss flows in networks. We then apply these results to determine 
whether a team can advance to the playoffs in an application to sports 
analysis. We also discuss theoretical results on graph matching. 


Remark 6.2. For the remainder of this chapter, we consider directed 
graphs with no isolated vertices and no self-loops. That is, we only 
consider those graphs whose incident matrices do not have any zero 
rows. These graphs will be connected and, furthermore, will have two 
special vertices vj and v,;,, and we assume that there is at least one 
directed path from v1 to Um. 


Remark 6.3. For those readers interested in the connection 
between flow problems and linear programming (optimization), see 
Chapter 12 after reading Section 6.1. An introduction to linear pro- 
gramming is provided in Chapter 11. 


6.1 The Maximum Flow Problem 


Definition 6.4 (Flow). Let G = (V,£) be a digraph, and suppose 
V = {csi pty} and 2 = {e1;.055¢_}+ If ey = (vj, 0;) isan edge, 
then a flow on e, is a real value x; > 0 that determines that amount 
of some quantity that will leave v; and flow along ex to v;. 
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Definition 6.5 (Vertex supply and demand). Let G = (V,£) 
be a digraph, and suppose V = {vj,...,Um}. The flow supply for 
vertex vu; is a real value 6; assigned to v; that quantifies that amount 
of flow produced at vertex v;. If b; < 0, then vertex v; consumes flow 
(rather than producing it). 


Definition 6.6 (Flow conservation constraint). Let G=(V, E) 
be a digraph, and suppose V = {v1,...,Um} and E = {ej,...,en}. 
Let I(i) be the set of edges with destination vertex v; and O(7) be the 
set of edges with source v;. Then, the flow conservation constraint 
associated to vertex v; is 


SO a SO eae i. (6.1) 
) 


k€O(i) kel (i 


Remark 6.7. Equation (6.1) states that the total flow out of vertex 
v; minus the total flow into v; must be equal to the total flow pro- 
duced (or consumed) at v;. Put more simply, excess flow is neither 
created nor destroyed. This is illustrated in Fig. 6.1. 


Definition 6.8 (Edge capacity). Let G = (V,E) be a digraph, 
and suppose V = {v,...,Um} and F = {e,...,en}. If e, € E, then 
its capacity is a real value uz > O that determines the maximum 
amount of flow the edge may be assigned. 


Definition 6.9 (Maximum flow problem). Let G = (V,E) bea 
digraph, and suppose that V = {v1,...,Um}. Let FE = {e1,..., ex}. 


Flow conservation: 5=x+y+b Flow conservation: 5+ b=x+y 


5 units flow in 


X+ yunits flow out C X+ yunits flow out 
O me 


b b 


5 units flow in 


bunits are consumed bunits are produced 


Fig. 6.1 Flow conservation is illustrated. Note that it really doesn’t matter if 
flow is produced (b > 0) or consumed (b < 0) or neither (b = 0). The equations for 
flow conservation are identical as long as we fix a sign for b when flow is produced 
(in this case, b > 0). 
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Assume that the vertex supply and demand for vertices v2,...,Um—1 
is zero. That is, assume bg = b3 =--- = bm_, = 0. A solution to the 
maximum flow problem is the largest vertex supply 6b; so that b) = 
—b,, and an edge flow 21,..., 2, that satisfies all flow conservation 
constraints. Hence, b; is the largest possible amount of flow that can 
be forced from vj to Um, and 71,...,2m is the flow that achieves this. 


6.2 Cuts 


Remark 6.10. Let G = (V,E) be a directed graph, and suppose 
V ={v1,...,Um} and FE = {e,...,e,}. Let Vi be any set of vertices 
containing v; and not containing v,, and let V2 = V \ v,. Imme- 
diately, we see that vm, € Vo. The edges connecting vertices in V; 
with vertices in V2 form an edge cut (see Definition 3.40); moreover, 
any edge cut that divides G into two components, one containing v1 
and the other containing v,,, corresponds to some sets Vj and V9. 
Thus, we refer to all such edge cuts by these generated sets; that is, 
(Vi, V2) corresponds to the edge cut defined when v1 € Vj, Um € Va, 
Vin V2 = 0, and Vi U V2 = V. For the remainder of this chapter, a 
cut refers to a cut of this type. This is illustrated in Fig. 6.2. 


Definition 6.11 (Cut capacity). Let G = (V,E) be a directed 
graph, and suppose V = {vj,...,Um} and F = {ej,...,e,}. Let 
(Vi, V2) be a cut separating v; from v,,, containing edges es,,... , €s,, 


Fig. 6.2 An edge cut constructed by choosing V; and V2 is illustrated. Note that 
v1 € Vi and vm € V2, separating the source from the destination. 
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with sources in V; and destinations in Vj. Here, s1,..., 5; is a subset 
of the edge indexes 1,...,n. Then, the capacity of the cut (Vi, V2) is 


l 
C(V1, V2) = S> Usp: (6.2) 
k=1 


That is, it is the sum of the capacities of the edges connecting a 
vertex in Vj to a vertex in Vo. 


Definition 6.12 (Minimum cut problem). Let G = (V,F) bea 
directed graph, and suppose V = {v1,...,Um} and F = {e1,...,en}, 
with v; and v,, being the source and destination vertices, respec- 
tively. The minimum cut problem is to identify a cut (Vi, V2) with 
minimum capacity C(V, V2). 


Lemma 6.13 (Weak duality). Let G = (V,E) be a directed 
graph, and suppose V = {v1,...,Um} and E = {ej,...,en}. Let 
(bj, a],.-., 2%) be a solution to the maximum flow problem, where by 
is the maximum flow. Let (V;*, Vs‘) be a solution to the maximum cut 
problem. Then, bj < C(V;*, V5‘). 


Proof. Flow conservation ensures that the total flow out of V/ 
must be equal to the total flow into V, because bg = b3 = --- = 
bm—1 = 0. Therefore, the maximum flow b| must be forced across 
the edge cut (V;*, V5‘). The edges leaving V;* and going to V5" have a 
capacity of C(V;*, V3‘). Therefore, it is impossible to push more than 
C(V*, V.) from V;* to V;*. Consequently, bf < C(V;, V+). 


6.3. The Max-Flow/Min-Cut Theorem 


Lemma 6.14. In any optimal solution to a maximum flow problem, 
every directed path from v1 to Um must have at least one edge at 
capacity. 


Proof. By way of contradiction, suppose there is a directed path 
from v, to V», on which the flow on each edge is not at capacity. 
Suppose this path contains edges from the set J C {1,...,n}. That 
is, the flows values of this path are x;,,...2j,, where |J| = 1. Then, 
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Ly, < uy, for each 71,..., 9. Let 


A= min wu, —2,.. 
ila oo 


Replacing xj, by xj, + A and b; by b; + A (and bm by bm — A) 
maintains flow conservation and increases the maximum flow by A. 
Therefore, we could not have started with a maximum flow. This 
completes the proof. 


Theorem 6.15. Let G = (V,E) be a directed graph, and suppose 
V = (0ij5.c2;0m} end 2 = 61, .2.5@,) There w af least one cut 
(Vi, V2) so that the flow from v1 to Um is equal to the capacity of the 
cut (Vi, V2). 


Proof. Let (bj,x*) be a solution to the maximum flow problem, 
which we know must exist. Here, x* = (xj,..., v7). 

By Lemma 6.14, we know that in this solution, every directed 
path from v1, to vm must have at least one edge at capacity. From 
each path from v1 to vm, select an edge that is at capacity in such a 
way that we minimize the total sum of the capacities of the chosen 
edges. Denote this set of edges as E’. 

If E’ is not yet an edge cut in the underlying graph of G, then 
there are some paths from v, to v,, in the underlying graph of G 
that are not directed paths from vj to v;,. In each such path, there 
is at least one edge directed from v,, toward v1 (otherwise, we would 
have added another edge to the cut). Choose one edge from each of 
these paths directed from v,, to v, to minimize the total cardinality 
of edges chosen, and add these edges to E’ (see Fig. 6.3). 

Let Vi be the set of vertices reachable from v; by a simple path 
in the underlying (undirected) graph G — E’, and let V2 be the set of 
vertices reachable from v,;, by a simple path in the underlying graph 
G — E’. This construction is illustrated in Fig. 6.3. 


Claim 1. Every verter is either in Vi or V2 using the definition of 
E’, and thus, the set E’ = (Vi, V2) is an edge cut in the underlying 
graph of G. 


Proof. See Question 6.1. 


Suppose E’ = {es,,...,€s,}. 
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V4 


At capacity _ 


Vo ~ 


Ae AaPatcontea "Imaginary Arc 
f 


Fig. 6.3 A cut is defined as follows: In each directed path from v1 to Um, we 
choose an edge at capacity so that the collection of chosen edges has minimum 
capacity (and flow). If this set of edges is not an edge cut of the underlying graph, 
we add edges that are directed from vm to v1 in a simple path from vi to vm in 
the underlying (undirected) graph of G. 


Claim 2. If there is some edge e, with source in Vz and destination 
in Vi, then x, = 0. 


Proof. If x, #0, we could reduce this flow to zero and increase the 
net flow from v; to vm, by adding this flow to b;. If the flow cannot 
reach v; along e, (illustrated by the middle path in Fig. 6.3), then 
flow conservation ensures it must be equal to zero. 


Claim 3. The total flow from v1 to Um must be equal to the capacity 
of the edges in E’ that have source in V, and destination in Vo. 


Proof. We've established that if there is some edge e, with source 
in V2 and destination in Vj, then rz, = 0. Thus, the flow from 
v1 to Um must traverse edges leaving V; and entering V2. Thus, 
the flow from v1, to v, must be equal to the capacity of the cut 
E’ = (Yi, V2). 


Claim 3 establishes that the flow bj must be equal to the capacity 
of a cut (V|, V2). This completes the proof of the theorem. 
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Corollary 6.16 (Max-flow/min-cut theorem). Let G = (V, FE) 
be a directed graph, and suppose V = {v4,...,Um} and E = 
{e1,...,@n}. Then, the maximum flow from v1 to Um is equal to the 
capacity of the minimum cut separating v1, from Um. 


Proof. By Theorem 6.15, if (b],x*) is a maximum flow in G from 
v1 to Um, then there is a cut (Vi, V2) so that the capacity of this 
cut is equal to bj. Since bj is bounded above by the capacity of 
the minimal cut separating v; from v,,, the cut constructed in the 
proof of Theorem 6.15 must be a minimal capacity cut. Thus, the 
maximum flow from v1 to v», is equal to the capacity of the minimum 
cut separating v, from um. 


6.4 An Algorithm for Finding Optimal Flow 


Remark 6.17. The proof of the max-flow/min-cut theorem we pre- 
sented is a bit of a nonstandard proof technique. Most techniques 
are constructive; that is, they specify an algorithm for generating a 
maximum flow and then show that this maximum flow must be equal 
to the capacity of the minimal cut. In this section, we develop this 
algorithm and show that it generates a maximum flow and then (as 
a result of the max-flow/min-cut theorem) this maximum flow must 
be equal to the capacity of the minimum cut. 


Definition 6.18 (Augment). Let G = (V, E) be a directed graph, 
and suppose V = {vj,...,Um} and F = {ej,...,en}. Let x bea 
feasible flow in G. Consider a simple path p = (v1,€1,.--,€1,;Um) in 
the underlying graph of G from v1 to vm. The augment of p is the 
quantity 


. up — 2, if the edge e, is directed toward vm, 
min ; (6.3) 
ke{l,...1} | ep otherwise. 
Definition 6.19 (Augmenting path). Let G = (V,E) be a 


directed graph, and suppose V = {v1,...,Um} and E = {e1,...,en}. 
Let x be a feasible flow in G. A simple path p in the underlying graph 
of G from vj to vm is an augmenting path if its augment is nonzero. 
In this case, we say that flow x has an augmenting path. 
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1/2 1/2 3/3 
1/3 sd (4) 
2/3 3/3 Val 
Augment = 1 Augment = 0.25 No augmenting path (Augment = 0) 


Fig. 6.4 Two flows with augmenting paths and one with no augmenting paths 
are illustrated. 


Example 6.20. An example of augmenting paths is shown in 
Fig. 6.4. An augmenting path is simply an indicator that more flow 
can be pushed from vertex v; to vertex U,,. For example, in the flow 
on the bottom left of Fig. 6.4, we could add an additional unit of 
flow on the edge (v1, v3). This one unit could flow along edge (v3, v2) 
and then along edge (v2, v4). Augmenting paths that are augmenting 
solely because of a backward flow away from v; to vm, can also be 
used to increase the net flow from v; to uv», by removing flow along 
the backward edge. 


Definition 6.21 (Path augment). If p is an augmenting path in 
G with augment A, then by augmenting p by A, we mean adding A 
to the flow in each edge directed from v, toward v,, and subtract A 
from the flow in each edge directed from vu, to v1. 


Example 6.22. Figure 6.5 shows the result of augmenting the flows 
shown in Example 6.20. 


Remark 6.23. Algorithm 11, sometimes called the Edmonds—Karp 
algorithm, finds a maximum flow in a network by discovering and 
removing all augmenting paths. 
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Fig. 6.5 The result of augmenting the flows shown in Fig. 6.4. 


Maximum Flow Algorithm 
Input: (G,u) a weighted directed graph with G = (V, FE), V = {u1,...,um}, 
E = {e1,...,en} 

Initialize: x = 0 {Initialize all flow variables to zero.} 


1 
2 


) 
) 
) 
) 
) 


Find the shortest augmenting path p in G using the current flow x 
if no augmenting path exists then STOP 

else augment the flow along path p to produce a new flow x 

end if 

GOTO (1) 


Output: x* 


Algorithm 11: Maximum flow algorithm. 
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Example 6.24. We illustrate an example of the Edmonds—Karp 
algorithm in Fig. 6.6. Note that the capacity of the minimum cut 
is equal to the total flow leaving Vertex 1 and flowing to Vertex 4 at 
the completion of the algorithm. 
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Flow/Capacity 
0/2 Re 2/2 Re 2/2 SNe 
cae © OS oC fe oO) OS oC fe O) 
0/3 o/1 o/3 o/t 3 WA 
Step 0 and First Step 1 and Second Step 2 and Third 
Augmenting Path Augmenting Path Augmenting Path No more augmenting paths! 


| STOP! 


V <u 


2/2 3/3 0/2 @.. os 
oq ro aC ce 
on 


2/3 Ww 0/3 
Max Flow =2+2=4 Min Cut Capacity = 4 


Fig. 6.6 The Edmonds—Karp algorithm iteratively augments flow on a graph 
until no augmenting paths can be found. An initial zero-feasible flow is used to 
start the algorithm. Note that the capacity of the minimum cut is equal to the 
total flow leaving Vertex 1 and flowing to Vertex 4. 


Remark 6.25. The Edmonds—Karp algorithm is a specialization 
(and correction) to the Ford—Fulkerson algorithm, which does not 
specify how the augmenting paths in Line 1 are chosen. 


Lemma 6.26. Let G = (V, EF) be a directed graph, and suppose V = 
{v1,...;Um} and E = {e1,...,en}. Let x* be a flow x* that is optimal 
if and only if it does not have an augmenting path. 


Proof. Our proof is by abstract example. Without loss of gener- 
ality, consider Fig. 6.7. Suppose there is a nonzero augment in the 
path shown in Fig. 6.7. If the flow f; is below capacity c,, and this 
is the augment, then we can increase the total flow along this path 
by increasing the flow on each edge in the direction of v,, (from v,) 
by A = c, — f; and decreasing the flow on each edge in the direction 
of v; (from v»,) by A. Flow conservation at each vertex on the path 
is preserved since we see that 


Ath-f=0 = GQrA)+(h—-A4)—fA=0 and 
(6.4) 


fstfe-fa=O = > (fg+A)+(fe—A)—-fa=0 (6.5) 


An Introduction to Network Flows and Combinatorial Optimization 113 


eee eee 
O77, 072 
fp Be 


Fig. 6.7 Illustration of the impact of an augmenting path on the flow from v, 
tO Um. 


as well as 


b} — fa-—fe-fi =O = (+A) — fa— fe — (fi: +A) =0 os 
6.6 


fos+fet+fs+b5=0 = fst fet (fs +A) + (bs — A) =0. an 
6.7 


Remember that bs; < 0 because it is a demand. 

The same argument holds if the flow on fz > 0, and this edge 
produces the nonzero augment. In this case, we can increase the 
total flow by decreasing the flow on each edge in the direction of v1 
(from Um) by A = fe and increasing the flow on each edge in the 
direction of v,, (from v;) by A. Thus, if an augmenting path exists, 
the flow cannot be maximal. 

Conversely, suppose we do not have a maximal flow. Then, by the 
max-flow/min-cut theorem, the flow across the minimal edge cut is 
not equal to its capacity. Thus, there is some edge in the minimal edge 
cut whose flow can be increased. Thus, there must be an augmenting 
path. This completes the proof. 


Remark 6.27. The proof of Lemma 6.26 also illustrates that Algo- 
rithm 11 maintains flow feasibility as it is executed. 


Remark 6.28. A proof that Algorithm 11 terminates is a bit compli- 
cated. The main problem is showing that augmenting paths (adding 
or subtracting flow) does not lead to an infinite cycle of augmenta- 
tions. The proof can be found in Ref. [59], where the running time is 
also proved. 
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Theorem 6.29. Algorithm 11 terminates in O(mn?) time. 


Theorem 6.30. At the completion of Algorithm 11, there are no 
augmenting paths and the flow x* is optimal. 


Proof. To see that x* is feasible, note that we never increase the 
flow along any path by more than the maximum amount possible to 
ensure feasibility in all flows, and a flow is never decreased beyond 
zero. This is ensured in our definition of augment. We start with a 
feasible (zero) flow; therefore, the flow remains feasible throughout 
Algorithm 11. 

To prove optimality, suppose at the completion of Algorithm 11, 
there was an augmenting path p. If we execute Line 1 of the algo- 
rithm, we will detect that augmenting path. Thus, no augmenting 
path exists at the conclusion of Algorithm 11, and by Lemma 6.26, 
x* is optimal. 


Corollary 6.31 (Integral flow theorem). [f the capacities of a 
network are all integers, then there exists an integral maximum flow. 


Remark 6.32. It is worth noting that the original form of 
Algorithm 11 did not specify which augmenting path to find. This 
leads to a pathological condition in which the algorithm occasionally 
will not terminate. This is detailed in Ford and Fulkerson’s original 
paper and more recently in Ref. [60]. The shortest augmenting path 
can be found using a breadth-first search on the underlying graph. 
This breadth-first search is what leads to the proof of Theorem 6.29. 


6.5 Applications of the Max-Flow/Min-Cut Theorem 


Remark 6.33. Consider the following scenario: A baseball team 
wins the pennant if it obtains more wins than any other team in 
its division. (A similar structure can be observed in hockey, except 
this partially determines playoff eligibility.) At the start of the sea- 
son, any team can win the pennant; however, as play continues, it 
occasionally becomes mathematically impossible for a team to win 
the pennant because of the number of losses they have incurred and 
the remaining schedule of games to be played. Determining whether 
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a team can still win the pennant is an interesting mathematical prob- 
lem that can be phrased as a max-flow problem. For simplicity, we 
ignore modern elements such as wild card spots, and we assume that 
if two teams tie in wins, they are still playoff (pennant) eligible and 
that they will play a tie-breaker game (series) in the postseason. 


Example 6.34. Consider the following league standings. 


against 
ATL | 8 | 72 | 9 | -|2 )s] 2 | 


ATL | 
Ppa [si_[ 7 | 4 | 2/]- ]o]2_ 
Day [7 [7 [5 [s fo ]-|0_ 
PMON | 7% [si [| 4 [| 2]2 ]0|-_ 


The against columns provide specific information on the remaining 
games to be played. It is clear that Montreal has been eliminated 
from the playoffs (or winning the division) because, with 76 games 
won and only four games left to play, they can never catch up to 
leader Atlanta. On the other hand, consider the following alternative 
league standings. 


(Sees seat! ene ete Against 
Pay [am] o | 2» |-,;2,;7]s|o 
Perr fm [2 [1 [2,-|,;4][s | 


Bos |e | er_| 7 |[7]a4{[- [ss 
PORE |= 662 We [or ee eee 
Pper[ | #7 | 7 [s{[7][s{[s]-_ 


We'd like to know if Detroit can still win the division. It certainly 
seems that if Detroit (amazingly) won every remaining game, it could 
come out ahead of New York if New York lost every game, but is 
that possible? It seems that the only way to figure this out is to put 
together all possible combinations of game wins and losses and see if 
there is some way Detroit can succeed in taking the pennant. This is 
easy for a computer (though time-consuming) and all but impossible 
for the average sports fan scanning her morning paper. A simpler 
way is to phrase the problem as a maximum flow problem. Consider 
the figure shown in Fig. 6.8. 
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Detroit could win as many as: 48 + 27 = 75 games 


ee) 
Capacity Flow Capacity=_ 
Games Detroit can win - 


Games Other team has won 


Flow Capacity= 
Games Remaining 


‘ & 
N 75-74=1 
2 

75-72=3 
8 ~~. 
A 

75 —- 68 =7, 
8 
8 

75-64=11 


Wins assigned here 


Wins flow out 


Games to play flow i 


Fig. 6.8 Games to be played flow from an initial vertex s (playing the role of 
v1). From here, they flow into the actual game events illustrated by vertices (e.g., 
NY-BOS for New York vs. Boston). Wins flow across the infinite-capacity edges 
to team vertices. From here, the games all flow to the final vertex t (playing the 
role of Um). 


In Fig. 6.8, the games to be played flow from an initial vertex 
s (playing the role of v,). From here, they flow into the actual 
game events illustrated by vertices (e.g., NY-BOS for New York 
vs. Boston). Wins (and losses) occur, and the wins flow across the 
infinite-capacity edges to team vertices. From here, the games all 
flow to the final vertex t (playing the role of v,,). Edges going from 
s to the game vertices have capacity equal to the number of games 
left to be played between the two teams in the game vertex. This 
makes sense; we cannot assign more games to that edge than can 
be played. Edges crossing from the game vertices to the team ver- 
tices have unbounded capacity; the values we assign them will be 
bounded by the number of games the team plays in the game ver- 
tices anyway. Edges going from the team vertices to the final vertex t 
have capacity equal to the number of games Detroit can win minus 
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the games the team whose vertex the edge leaves has already won. 
This tells us that for Detroit to come out on top (or with more wins 
than any other team), the number of wins assigned to a team cannot 
be greater than the number of wins Detroit can amass (at best). We 
use Detroit’s possible wins because we want to know if Detroit can 
still win the pennant. If you were interested in another team, you 
would use the statistics from that team instead. 

If the maximum flow in this graph fully saturates the edges leaving 
s, then there is an assignment of games so that Detroit can still finish 
first. On the other hand, if the edges connecting the team vertices to 
t form the minimum cut and the edges leaving s are not saturated, 
then there is no way to assign wins to Detroit to ensure that it wins 
more games than any other team (or at best ties). The maximum 
flow in this example is shown in Fig. 6.9. From this figure, we see 
that Detroit cannot make the playoffs. There is no way to assign 
all remaining games and for Detroit to have the most wins of any 
team (or to at least tie). This is evident since the edges leaving s are 


CO 


Flow at optimality/ Capacity 


Max possible flow 


‘. Minimum 

, ‘\ Capacity Cut 
\ 

\75-74=1 


No flow pee 


Wins assigned here 


Games to play flow in & Wins flow out 


Fig. 6.9 Optimal flow was computed using the Edmonds—Karp algorithm. Note 
that a minimum-capacity cut consists of the edges entering t and not all edges 
leaving s are saturated. Detroit cannot make the playoffs. 


118 Applied Graph Theory 


not saturated. Note that this approach to baseball analysis was first 
studied by Schwartz [61]. 


Remark 6.35. Consider a score table for a team sport with n 
teams and with playoff rules like those discussed in Remark 6.33. We 
refer to P(k) as the maximum flow problem constructed for team k 
(k =1,...,n), as in Example 6.34. 


Proposition 6.36. If the mazimum flow for Problem P(k) saturates 
all edges leaving verter s, then team k is playoff eligible. Otherwise, 
team k has been eliminated. 


6.6 More Applications of the Max-Flow/Min-Cut 
Theorem 


Remark 6.37. The following theorem, Menger’s first theorem, can 
be proved directly using the max-flow/min-cut theorem. 


Theorem 6.38 (Menger’s first theorem). Let G be an 
(undirected) graph with V = {v1,...,Um}. Then, the number of edge- 
disjoint paths from v1 to Um is equal to the size of the smallest edge 
cut separating v1 from Um. 


Theorem 6.39 (Menger’s second theorem). Let G = (V, E) be 
a directed graph. Let vy and v2 be two nonadjacent and distinct ver- 
tices in V. The maximum number of vertex-disjoint directed paths 
from v1 to v2 is equal to the minimum number of vertices (excluding 
v1 and v2) whose deletion will destroy all directed paths from v, to v2. 


Proof. We construct a new graph by replacing each vertex v in G 
by two vertices v’ and v” and an edge (v’,v”). Each edge (v, w) is 
replaced by the edge (v”,w’), while each edge (u,v) is replaced by 
(u’,v’), as illustrated in the following. 


O00 E> O-0-0-0 


Note that each arc of the form (v’,v”) corresponds to a vertex in 
G. Thus, edge-disjoint paths in the constructed graph correspond to 
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vertex-disjoint graphs in the original graph. The result follows from 
Menger’s first theorem. 


Definition 6.40 (Matching). A matching in a graph G = (V, EF) 
is a subset M of EF such that no two edges in M share a vertex in 
common. A matching is mazimal if there is no other matching in 
G containing it. A matching has maximum cardinality if there is no 
other matching of G with more edges. A maximal matching is perfect 
if every vertex is adjacent to an edge in the matching. 


Example 6.41. We illustrate a maximal matching and a perfect 
matching in Fig. 6.10. 


Remark 6.42. Let G = (V,E) be a graph. Recall from Defini- 
tion 2.44 that a vertex cover is a set of vertices S C V so that 
every edge in EF is adjacent to at least one vertex in S. 


Definition 6.43 (Minimal cover). Let G = (V, E) bea graph. A 
vertex cover S has minimum cardinality if there is no other vertex 
cover $” with smaller cardinality. It is minimal if there is no vertex 
we can remove from S$ to obtain a smaller cardinality vertex S’. 


Lemma 6.44. Let G = (V, F) be a graph. If M is a matching in G 
and C' is a covering, then |M| < |C|. 


Maximal Matching Perfect Matching 


Fig. 6.10 A maximal matching and a perfect matching. Note that no other edges 
can be added to the maximal matching, and the graph on the left cannot have a 
perfect matching. 
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Proof. Each edge in F is adjacent to at least one element of C, 
meaning that C contains one end point from each edge. We may 
associate to each element of M either of its two end points. Let Q be 
the resulting set of vertices. Clearly, || = |Q|. However, since MW 
contains only a subset of the edges in EF, it is clear that Q can never 
have more elements than C’ because, at best, we can ensure that 
Q contains only end points of the elements of M. Thus, |Q| < |C|, 
which implies that |M| < |C|. Equality is achieved if M is a perfect 
matching. If the edges of the matching contain every vertex (e.g., it 
is perfect), then the covering C' can be recovered by simply choosing 
the correct vertex from each match. 


Theorem 6.45 (K6nig’s theorem). Jn a bipartite graph, the 
number of edges in a maximum cardinality matching is equal to the 
number of vertices in a minimum cardinality covering. 


Proof. Let G = (V,£) be the bipartite graph with V = Vy U Vo. 
Let M* be a maximum cardinality matching for G, and let C* be 
a minimum cardinality covering. First, note that |M*| < |C*| by 
Lemma 6.44. 

Construct a new graph N from G by introducing new vertices s 
and ¢ so that s is adjacent to all vertices in V; and t is adjacent to 
all vertices in Vj. This is illustrated in the following. 


In the remainder of the proof, s will be our source (v1) and t will be 
our sink (v,,). Consider a maximal (in cardinality) set P of vertex- 
disjoint paths from s to t. (Here, we are thinking of G as being 
directed from vertices in V; toward vertices in V2.) Each path p € P 
has the form (s, €1, V1, €2, V2, e3,t), with vy € Vi and ve € V3. It is 
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easy to see that we can construct a matching M(P) from P, so for 
path p, we introduce the edge eg = {v1, v2} into our matching M(P). 
The fact that the paths in P are vertex disjoint implies that there is a 
one-to-one correspondence between elements in M(P) and elements 
in P. Thus, |P| < |M*| since we assumed that M* was a maximum 
cardinality matching. 

Now, consider the smallest set J C V whose deletion destroys 
all paths from s to t in N. By way of contradiction, suppose that 
|J| < |C*|. Since we assumed that C* was a minimal vertex cover, 
it follows that J is not itself a vertex cover of G, and thus, G — J 
leaves at least one edge in G. But this edge must connect a vertex in 
VY, to a vertex in V2 because G is bipartite. Thus, N — J has a path 
from s to t, which is a contradiction. Thus, |C*| < |J|. Thus, we have 
inequalities: |P| < |M*| < |C*| < |J|. However, by Menger’s second 
theorem, minimizing |J| and maximizing |P| implies that |J| = |P|, 
and thus, |M*| = |C*|. This completes the proof. 


Remark 6.46. Konig’s theorem does not hold for general graphs, 
which can be seen by considering K3. We discuss this more exten- 
sively in a second treatment on network flows using linear program- 
ming in Remark 12.26 in Chapter 12. 


6.7 Chapter Notes 


The maximum flow problem was posed by Harris and Ross as an out- 
growth of the military use of operations research and optimization. 
In particular, Harris and Ross wanted to analyze Soviet transship- 
ment capabilities on railways [62]. Ford and Fulkerson [63] published 
an incomplete solution that did not treat the possibility of an infinite 
loop in the augmentation process. However, their solution had all the 
elements of the algorithm given in this chapter, which was developed 
by Edmonds and Karp [64]. Interestingly, Israeli (formerly Soviet) 
mathematician Dinitz published a more efficient algorithm [65] two 
years prior to Edmonds and Karp. This algorithm, called “Dinic’s” 
algorithm because of mispronunciation of the author’s name during 
its popularization, has a running time of O(|V|?|E|) as compared 
to the running time of O(|V |||?) of the Edmonds—Karp algorithm. 
However, because of the dense way Dinitz’s paper had to be written 
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to conform with Soviet journal requirements and the difficulty in 
East-West relations at the time, the Edmonds—Karp algorithm is 
the one generally taught in the West. The maximum flow problem 
has been and continues to be an active area of research. In particular, 
the notes for Chapter 12 discuss the interaction of linear program- 
ming and network flows, specifically the additional work of Orlin, 
who found an O(|V||£]) algorithm for the maximum flow problem 
in 2013 [66]. In 2022, Chen et al. [67] posted a paper to arXiv that 
claims a nearly linear solution to the maximum flow problem. The 
running time for their algorithm is O(|E|'+?™), or very close to (but 
not quite) O(|E|). At the time of writing this chapter, that paper had 
not appeared in a peer-reviewed journal or conference. 


6.8 Exercises 


Exercise 6.1 
Prove Claim 1 in the proof of Theorem 6.15. 


Exercise 6.2 
Prove the integral flow theorem. 


Exercise 6.3 
Consider the following sports standings. 


[Team [ Wins [Loses [Remaining (vs A [vs 8 Ps CPs DJ 


Assuming that the team with the most wins will go to the playoffs 
at the end of the season (and ties will be broken by an extra game) 
and there are no wildcard spots: 


(1) (1 Point) Construct a network flow problem (the picture) to 
determine whether Team B can still go to the playoffs. 

(2) (2 Points) Determine whether Team B can still go to the playoffs. 

(3) (2 Points) Determine whether Team D can still go to the playoffs. 


Exercise 6.4 
Prove Proposition 6.36. 
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Exercise 6.5 

Prove Menger’s first theorem. [Hint: Enumerate all edge-disjoint 
paths from v; to vm, and replace them with directed paths from 
v1 tO Um. If any edges remain undirected, then give them arbitrary 
direction. Assign each arc a flow capacity of 1.] 


Exercise 6.6 

Prove Erdés—Egervary theorem: Let A be a matrix of all 0’s and 1’s. 
Then, the maximum number of 1’s in matrix A, no two of which lie 
in the same row or column, is equal to the minimum number of rows 
and columns that together contain all the 1’s in A. [Hint: Build a 
bipartite graph whose vertex set is composed of the row and column 
indices of A. Join a row vertex to a column vertex if that position 
contains a 1. Now, apply a theorem from this chapter.] 
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Chapter 7 


Coloring 


Remark 7.1 (Chapter goals). The goal of this chapter is to intro- 
duce graph coloring. A large part of the chapter is devoted to prov- 
ing that the problem of determining whether a graph can be colored 
by three colors is NP-complete. Therefore, concepts from algorithm 
complexity are introduced. We also prove a number of pure graph- 
theoretic results on graph coloring and discuss how graph coloring 
can be used in scheduling problems. 


7.1 Vertex Coloring of Graphs 


Definition 7.2 (Vertex coloring). Let G = (V,E) be a graph, 
and let C = {c1,...,cx} be a finite set of colors (labels). A vertex 
coloring is a mapping c: V + C with the property that if {v1, vo} € 
E, then c(v,) 4 c(v2). 


Example 7.3. We show an example of a graph coloring in Fig. 7.1. 
Note that no two adjacent vertices share the same color. 


Definition 7.4 (k-colorable). A graph G = (V, £) is k-colorable 
if there is a vertex coloring with k colors. 


Remark 7.5. Clearly, every graph G = (V, F) is |V|-colorable since 
we can assign a different color to each vertex. We are usually inter- 
ested in the minimum number of colors that can be used to color a 
graph. 
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Fig. 7.1 A graph coloring. We need three colors to color this graph. 


Definition 7.6 (Chromatic number). Let G = (V,E) be a 
graph. The chromatic number of G, written y(G), is the smallest 
positive integer k such that G is k-colorable. 


Proposition 7.7. Every bipartite graph is 2-colorable. 
Proposition 7.8. [f G=(V,E) and |V| =n, then 


(G) = aay (7.1) 


where a(G) is the independence number of G. 


Proof. Suppose ,(G) = k and consider the set of vertices V; = 
{v €V:c(v) =c}. Then, this set of vertices is an independent set 
and contains at most a(G) elements. Thus, 


n=(|Vi|+|Vo]+---+|Ve| < o(G) + a(G)+---+a(G). (7.2) 


From this, we see that 
n 


a(G) 


n<k-a(G) = aes (7.3) 


Proposition 7.9. The chromatic number of Ky is n. 
Proof. From the previous proposition, we know that 
(7.4) 


However, a(K,) = 1, thus y(K,,) > n. From Remark 7.5, it is clear 
that y(Kn) <n. Thus, y(K,) =n. 
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Theorem 7.10. Let G = (V,E) be a graph. Then, x(G) > w(G). 
That is, the chromatic number is bounded below by the size of the 
largest clique. 


Theorem 7.11. If G = (V,F) is a graph with the highest degree of 
A(G), then x(G) < A(G) +1. 


Proof. Arrange the vertices of G in ascending order of their degree. 
Fix an arbitrary ordering of the colors. Assign an arbitrary color c; 
to the first vertex. Repeat this process with each vertex in order, 
assigning the lowest-ordered color possible. When any vertex v is to 
be colored, the number of colors already used cannot be any larger 
than its degree. At the completion of the coloring, we see that the 
number of colors cannot be any larger than A(G), thus we might 
require at most one extra color. Thus, x(G) < A(G) +1. 


Corollary 7.12. There is at least one graph for which this bound is 
strict. 


Proof. Proposition 7.9 illustrates that for the complete graph, 
x) = ALR) + 1=n, 


Remark 7.13. The coloring heuristic described in Theorem 7.11 is 
called the greedy coloring heuristic. It is a greedy algorithm. 


Proposition 7.14. If G = (V,E) is a graph and H = (V', E’) is a 
subgraph of G, then x(H) < x(G). 


Proof. Clearly, if G is k-colorable, then so is H. Thus, x(H) < 
x(G). 


Example 7.15 (Exam scheduling as a coloring problem). 
Suppose there are four students: Alice, Bob, Charlie, and Donna. 
These four students are enrolled in five different classes. Their class 
assignments are shown in the directed bipartite graph in Fig. 7.2 
(left). For example, we see that Alice is enrolled in Classes 1, 3, 
and 5. The goal is to schedule final exams for these classes so that no 
student has a conflict; i.e., no student is required to take two or more 
exams simultaneously. To solve this problem, create a new graph with 
vertices corresponding to classes. Add an edge between two classes 
in this graph if and only if those two classes have at least one stu- 
dent in common. This is illustrated in Fig. 7.2 (right). If we imagine 
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Student 


Class 


Fig. 7.2 Student class schedules can be converted into a graph with classes on 
the vertices and edges corresponding to classes that have at least one student in 
common. Scheduling exams (i.e., assigning exam slots to classes) then corresponds 
to coloring the graph. 


that each exam slot (e.g., 9-10 AM on Thursday) is a color, then 
we can ask how many colors does it take to color the class graph. In 
this case, to create an exam schedule in which no student is required 
to sit two exams simultaneously, we require at most three colors or 
three exam slots. This is shown in Fig. 7.2 (right). This coloring can 
be derived using the greedy heuristic described in Theorem 7.11. The 
fact that we require exactly three colors follows from the fact that 
K3 is a subgraph of the class graph and Proposition 7.14. 

This approach to exam scheduling (or similar problems) could 
be generalized to much larger settings. However, as we will see in 
the following sections, it is generally very hard to find an optimal 
coloring for an arbitrary graph. Consequently, we may have to settle 
for a coloring that is generated by the greedy heuristic and (in some 
sense) “good enough.” 


Remark 7.16. Before proceeding, recall the following definition: 
The graph Ky, is the complete bipartite graph consisting of the 
vertex set V = {v11,---, Vim} U {vo1,.--, Van} and having an edge 
connecting every element of V; to every element of V2. We state the 
following lemma without proof. A very accessible proof can be found 
in Ref. [68]. 
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Lemma 7.17 (Chartrand and Kronk, 1961). Let G = (V,£) 
be a graph. If every depth-first search tree generated from G is a 
Hamiltonian path, then G is either a cycle, or a complete graph, or 
a complete bipartite graph. 


Remark 7.18. The proof of the following theorem is based on the 
one from Ref. [69]. The proof is tricky and long, so it is probably 
best to read it slowly. 


Theorem 7.19 (Brooks, 1941). If G = (V, £) is connected and is 
neither a complete graph nor an odd cycle, then y(G) < A(G). 


Proof. Suppose G is not regular. Choose a vertex ug € V with 
a degree of 6(G) (the smallest degree in G’) and construct a depth- 
first search tree 7 starting from vg. We now apply the following 
algorithm for coloring G: At step k, choose a leaf from the sub- 
tree T;, of T induced by the set of uncolored vertices. (Note that 
To = T.) Color the leaf with the lowest possible index color from the 
set {C1,---,Ca(q)}- In this way, the last vertex to be colored will be 
up. At each step, when v 4 vo is about to be colored, it must be 
adjacent to at most deg(v) — 1 colored vertices. To see this, note that 
v is the leaf of a tree. So, in T’,, it must have a degree of 1. Thus, it is 
adjacent to at least one uncolored vertex that is not currently a leaf 
of Tj. Thus, since v is adjacent to at most deg(v) — 1 < A(G) -1 


vertices, it follows that v can be colored from {c1,...,cacq}. At 
last, when vo is colored, it is adjacent to 6(G) < A(G) — 1 colored 
vertices, and thus, we may choose a color from {¢1,...,ca(q)}- Thus, 


G is A(G)-colorable. 

Now, suppose that G is regular. There are two possibilities: (i) G 
contains a cut vertex vg; or (ii) G does not contain a cut vertex. Con- 
sider case (i) and suppose we remove vp to obtain several connected 
components. If we add vp back in to each of these components, then 
these components are not regular and each is colorable using at most 
A(G) colors by our previous result. If we arrange each of these col- 
orings so that vg is colored with color c,, then clearly the original 
graph is itself colorable using at most A(G) colors. 

We are now left with case (ii) above. Consider a depth-first search 
tree T of G initialized from some vertex vo (it does not matter now 
which vertex is chosen since all vertices have the same degree). If T 
is a Hamiltonian path, then by Lemma 7.17, G is either the complete 
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graph, a cycle, or a complete bipartite graph. By assumption, G is not 
a complete graph, nor is it an odd cycle. If G is an even cycle, then 
order the vertices from 1 to |V| and color the odd-numbered vertices 
with c, and the even-numbered vertices with cg. This is clearly a 
two coloring of G and A(G) = 2. On the other hand, if G is a 
complete bipartite graph, then by Proposition 7.7, G is 2-colorable 
and 2 < A(G) (because G has at least three vertices since Ky; = Ko, 
which we discount by assumption). 

Finally, suppose that T’ is not a Hamiltonian path. Then, there 
is some vertex v € T with a degree of at least 3. Suppose that wu 
and w are two vertices adjacent to v that were added to JT after v 
in the depth-first search. From this, we know that wu and w are not 
adjacent (if they were, one would not be adjacent to v). Thus, we can 
color v and w with color c,, and then, in the depth-first tree from v, 
we repeat the same process of coloring vertices that we used in the 
non-regular case. When we are about to color v, since we have used 
only at most A(G) — 1 colors to color the neighbors of v (since w 
and u share a color), we see that there is one color remaining for v. 
Thus, G is A(G)-colorable. 


7.2 Some Elementary Logic and NP-Completeness 


Remark 7.20. Our goal in this section is to provide a simple defi- 
nition of propositional calculus and the satisfiability problem so that 
we can use it to prove that determining whether a graph is 3-colorable 
is NP-complete. The majority of the discussion on logic is taken from 
Ref. [70]. 


Definition 7.21. The propositional connectives are: and (A), or (V), 
and not 7. The connectives A and V are binary, while — is a unary 
connective. 


Definition 7.22. A propositional language L is a set of propositional 
atoms 21, £2, £3, .... An atomic formula consists of a propositional 
atom. 


Example 7.23. A propositional atom might be the statement: “It 
is raining.” (x1) or “It is cloudy.” (x2). 
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Definition 7.24. An L-formula is generated inductively as follows: 


(1) Any atomic formula is an L-formula. 

(2) If ¢, and ¢2 are two L-formulae, then ¢; A ¢2 is an L-formula. 
(3) If ¢, and ¢2 are two L-formulae, then ¢; V ¢2 is an L-formula. 
(4) If ¢ is an L-formula, then —¢, is an L-formula. 


Example 7.25. Continuing from Example 7.23, we might have the 
formula x1 A x2, meaning “It is cloudy and it is raining.” 


Example 7.26. If x1, x2, and 73 are propositional atoms, then x; /A 
(>x2 V x3) is an L-formula. 


Definition 7.27. An L-assignment is a mapping M : L > {T, F} 
that assigns to each propositional atom the value of TRUE (7') or 
FALSE (F)). 


Remark 7.28. The following proposition follows directly from 
induction on the number of connectives in an L-formula. 


Proposition 7.29. Given an L-assignment, there is a unique valua- 
tion vy of any formula so that if @ is an L-formula vyy(¢) € {T, F} 
given by the following: 


(1) If @ is atomic, then vu (¢) = M(¢). 

(2) If¢= 1 V ¢2, then vy (¢) = F if and only if uv(¢1) = F and 
um ($2) = F. Otherwise, vy(¢) = T. 

(3) If 6 = o1 A $2, then um (¢) =T if and only if um(¢1) = T and 
um(¢2) =T. Otherwise, vy(¢) = F. 

(4) If 6 = 741, then vy(¢) = T if and only if vyz(¢1) = F. Other- 
wise, uu(o) =T. 


Example 7.30. Consider the formula 7 A (422 V x3). If M(a1) = F 
and M(x2) = M(a3) = T, then vp(72%2) = F, vy (-22 V 43) = T 
and uy (a1 A (7% V #3)) = F. 


Definition 7.31 (Satisfiable). An L-formula ¢ is satisfiable if 
there is some L-assignment M so that vy(¢) = T. A set of for- 
mulae S is satisfiable if there is some L-assignment M so that for 
every ¢ € S, vuys(¢) = T. That is, every formula in S' evaluates to 
true under the assignment M. 
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Example 7.32. The formula x; A (722 V 23) is satisfiable when we 
have M(x1) =T and M(a#2) = M(a23) =T. 


Definition 7.33 (3-satisfiability). Suppose we consider a (finite) 
set of formulae S$ with the following properties: 


(1) Every formula contains exactly three atoms or their negations. 
(2) The atoms (or their negations) are connected by or (V) connec- 
tives. 


For any arbitrary S, the question of whether S is satisfiable is called 
the 3-satisfiability problem or 3 — SAT. 


Example 7.34. Suppose S consists of the formulae: 


(1) x1 V 722 V #3; and 
(2) t4V 21 V 743. 


Then, the question of whether S is satisfiable is an instance of 
3 —SAT. 


Remark 7.35. Note that we can express each 3 — SAT problem as 
a problem of satisfiability of one formula. In our previous example, 
we are really attempting to determine whether 


(xy V 12% V x3) /\ (x4 VV 723) 


is satisfiable. This is the way 3 — SAT is usually expressed. A formula 
of this type, consisting of a collection of many “or” formulae com- 
bined with “and’s” is said to be in conjunctive normal form (CNF). 
As a result, this is sometimes called 3 —- CNF — SAT. 


Remark 7.36 (NP-completeness). A true/false question (prob- 
lem) P is NP-complete if (i) the answer can be checked (but not 
necessarily obtained) in polynomial time and (ii) if there is a trans- 
formation of any other true/false question that can be checked in 
polynomial time into P that can be accomplished in polynomial time. 
More details can be found in Ref. [28]. 

This definition can be a little hard to understand, and in a sense, 
the rest of the chapter will be used to build an understanding of what 
we mean. The essential idea is that it is easy to check a solution 
to any NP-complete problem. For example, if a graph coloring is 
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provided, it is easy to check that it correctly colors a graph and to 
count the number of colors it has. But it may not be easy to find that 
coloring. 

The second part deals with the “completeness.” It simply says 
that “if you could solve one NP-complete problem, then you can solve 
them all.” This process of transforming one problem into another is 
called reduction, and in this context, it is only useful if the reduction 
can be accomplished efficiently (in polynomial time). 


Remark 7.37. We state, but do not prove, the following theorem, 
which was shown in Karp’s original 21 NP-complete problems [71]. 


Theorem 7.38. The problem of deciding 3 — SAT is NP-complete. 


Remark 7.39. What the previous theorem means is that (unless 
P = NP), any algorithm that produces an L-assignment satisfying a 
set of L-formulae S composed of formulae of the type from Defini- 
tion 7.33 or determines that one does not exist may take a very long 
time to run, but the answer it gives can be verified in a polynomial 
number of operations in the number of atoms and the size of S, as 
illustrated by Proposition 7.29. 


7.3 NP-Completeness of k-Coloring 


Remark 7.40. Our goal in this section is to prove that the prob- 
lem of determining whether a graph is 3-colorable is NP-complete. 
We do this by showing that there is a polynomial time reduction to 
the 3— SAT problem. What this means is that given an instance of 
the 3 — SAT problem, we show that we can construct a graph that 
is 3-colorable if and only if the 3— SAT instance is satisfiable in a 
polynomial amount of time (as a function of the size of S$). Thus, if 
we could solve the 3-colorability problem in a polynomial amount of 
time, we’d be able to solve 3 — SAT in a polynomial amount of time. 
This contradiction implies that 3-coloring is NP-complete. 


Theorem 7.41. Deciding whether a graph is 3-colorable is NP- 
complete. 
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Proof. Consider an instance of 3 — SAT with a finite set of formu- 
lae S. We construct a graph G that is 3-colorable if and only if S$ 
is satisfiable, and we argue that this construction can be completed 
in a number of operations that is a polynomial function of the num- 
ber of formulae in S and the number of atoms in the underlying 
propositional language. 

We initialize the graph G with three vertices {T, F, B} that form 
a complete subgraph. Here, 7 will be a vertex representing TRUE, 
F will be the vertex representing FALSE, and B is a bridge vertex. 
Without loss of generality, assume we color T green, F' red, and B 
blue. This is shown in Fig. 7.3. 

For each propositional atom x; in the logical language L we are 
considering, add two vertices v; and v; to G. Add an edge {v;, v;} to 
G, as well as edges {v;, B} and {v;, B}. This ensures that (i) v; and 
v; cannot have the same color and (ii) neither v; nor v; can have the 
same color as vertex B. Thus, one must be colored green and the 
other red. That means either x; is true (corresponding to v; colored 
green) or 72; is true (corresponding to v; colored green). This is 
illustrated in Fig. 7.4. 

By assumption, each formula ¢; in S has the structure aj;(x;,) V 
Bj (x52) V Yj(xj,), Where o;(2j,) = 25,, if 6; =2;,V--- and a;(zj,) = 
a2;, if 6; = 723, V---. The effects of 8; and y; are defined similarly. 
Add the five vertices t;,,tj,,...,tj;, to the graph with the properties 
that 

j 


(1) t;,, tj., and t;, form the subgraph K3; 
(2) tj, is adjacent to tj,; 
(3) tj, is adjacent to tj,; 


Fig. 7.3 At the first step of constructing G , we add three vertices {T, F’, B} that 
form a complete subgraph. 
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Fig. 7.4 At the second step of constructing G, we add two vertices vi and v} to 
G and an edge {u, v;}. 


(4) both t;, and t;, are adjacent to JT’, and 


(a) if aj(%;,) = 2;,, then tj, is adjacent to v;,, otherwise tj, is 
adjacent to uv’, , 

(b) if 8;(%5,) = #5, then t;, is adjacent to v;,, otherwise t,, is 
adjacent to v;,, 

(c) if y;(xj,) = vj, then tj, is adjacent to v;,, otherwise t;, is 
adjacent to Vig: 


This construction is illustrated in Fig. 7.5 for the case when ¢; = 
Lj, V Ljg V £;,. We must now show that there is a 3-coloring for this 
graph just in case S' is satisfiable. Without loss of generality, we show 
the construction for the case when ¢; = xj, Vj, Vxj;,. All other cases 
follow by an identical argument with a modified graph structure. For 
the remainder of this proof, let v be a valuation function. 


Claim 1. If v(xj,) = v(xj.) = v(aj,) = FALSE, then G is not 
3-colorable. 


Proof. To see this, observe that either ¢;, or t;, must be colored 
blue and the other green since v1, v2, and v3 are colored red. Thus, t;, 
must be colored red. Furthermore, since v;, is colored red, it follows 
that ¢;, must be colored blue. But then, tj, is adjacent to a green 
vertex (TJ), a red vertex (t;,), and a blue vertex t;,. Thus, we require 
a fourth color. This is illustrated in Fig. 7.6(a). 
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"Gadget" 


Fig. 7.5 At the third step of constructing G, we add a “gadget” that is built 
specifically for term @;. 


Claim 2. [f v(x;,) = TRUE or v(xj,.) = TRUE or v(x;,) = TRUE, then 
G is 3-colorable. 


Proof. The proof of the claim is illustrated in Fig. 7.6(b)—(h). 


Our two claims show that by our construction of G that G is 
3-colorable if and only if every every formula of S can be satis- 
fied by some assignment of TRUE or FALSE to the atomic propo- 
sitions. (It should be clear that variations of Claims 1 and 2 are 
true by symmetry arguments for any other possible value of @;; e.g. 
oj = @j, V 7x54 V @j,.) If we have n formulae in S and m atomic 
propositions, then G has 5n + 2m + 3 vertices and 3m + 10n 4+ 3 
edges, and thus, G can be constructed in a polynomial amount of 
time from S. It follows at once that since 3 — SAT is NP-complete, 
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(a) Case FFF 


(b) Case FFT 


(c) Case FTF 


(d) Case FTT 


(e) Case TFF 


(f) Case TFT 


(h) Case TTT 


Fig. 7.6 When ¢; evaluates to false, the graph G is not 3-colorable, as illustrated 
in Subfigure (a). When ¢; evaluates to true, the resulting graph is colorable. By 
the label TFT, we mean v(2;,) = v(aj,) = TRUE and vj, = FALSE. 
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so is the question of whether an arbitrary graph is 3-colorable. This 
completes the proof. 


Corollary 7.42. For an arbitrary k, deciding whether a graph is 
k-colorable is NP-complete. 


7.4 Graph Sizes and k-Colorability 


Remark 7.43. It is clear from Theorem 7.10 that graphs with arbi- 
trarily high chromatic numbers exist. What is interesting is that we 
can induce such graphs without the need to induce cliques. In partic- 
ular, we can show constructively that graphs with arbitrarily large 
girths exist, and these graphs have large chromatic numbers. 


Lemma 7.44. In a k-coloring of a graph G = (V,E) with x(G) =k, 
there is a vertex of each color that is adjacent to vertices of every 
other color. 


Proof. Consider any k-coloring of the graph in question. For any 
color c¢; (i € {1,...,k}), there is at least one vertex v with color ¢; 
whose color cannot be changed. (If not, then we would repeat the pro- 
cess of recoloring vertices colored c; until we need only k — 1 colors.) 
Now, suppose this v is not adjacent to k — 1 vertices of a color other 
than c;. Then, we could recolor v with a different color, contradicting 
our assumption on v. Thus, we see v must have a degree of k — 1 and 
is adjacent to a vertex colored with each color in {c1,...,c,} other 
than c;. 


Theorem 7.45. For any positive k, there exists a triangle-free graph 
with a chromatic number of k. 


Proof. For k = 1 and k = 2, clearly Ky, and Ko satisfy the criteria 
of the theorem. We proceed by induction. Assuming that the state- 
ment is true up to some arbitrary k, we show that the result is true 
for k +1. Let Gy be a triangle-free graph with y(G;,) = k. (That is, 
Gy does not contain a clique since it does not contain a subgraph 
isomorphic to K3.) Suppose that V, = {v1,...,Un} are the vertices 
of G;,. We construct G;41 in the following way: 


(1) Add n +1 vertices to Gy uz, corresponding to v1, through un, 
corresponding to v,, and an extra vertex v. 
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Ky = Go 


1 
' 
1 
1 


Q © 


Fig. 7.7 Constructing G3 from G2. 


(2) Add an edge from each u; to v to form a star graph (as a subgraph 


of Grit. 
(3) Add an edge from u; to each neighbor of v;. (That is, uj becomes 
adjacent to v2,...,Un-) 


This is illustrated for constructing Gg from G» in Fig. 7.7. 
Claim 3. The graph Gx contains no triangles. 


Proof. The set U = {uj,...,uUn} is an independent set. Thus, 
any subgraph of Gz 1 isomorphic to K3 must contain at most one 
element of U. Therefore, suppose that the vertices uj, vj, and vz 
form a triangle; i.e., there are edges {uj,v;}, {ui, ve}, and {v;, ug}. 
Then, since u; is adjacent to vj and vg in Gx41, it follows that v; 
is a neighbor of uv; and vz is a neighbor of v;; therefore, the edges 
{v;, Vj}, {vi, ve}, and {v;, vz} exist, and thus, there is a triangle in 
G;, contradicting our inductive hypothesis. 


It now suffices to show that y(Gx41) = k+1. It is clear that Gx41 is 
at least k + 1-colorable since any k-coloring of Gz can be extended to 
Gx+1 by coloring each u; with the same color as v; and then coloring v 
with a k+1** color. Now, suppose that G41 is k-colorable. Applying 
Lemma 7.44, there is a vertex v; having color c, that is adjacent to 
vertices having every other color. Since u; has the same neighbors 
as u;, it follows that u; must also be colored c,. Thus, all k& colors 
appear in the vertices of u;. But, since each vertex u; is adjacent 
to uv, there is no color available for v, and thus, G,.; must have a 
chromatic number of k + 1. 
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7.5 Chapter Notes 


Graph-coloring problems originate in the coloring of planar graphs, 
which are not discussed in this book (in order to make more room 
for algebraic graph theory). A graph is planar if it can be drawn on 
a sheet of paper so that none of its edges cross. Edges do not need to 
be drawn as straight lines but may curve to avoid intersecting other 
edges. It is an interesting exercise to show that Ky is planar whereas 
Ks is not. A simple proof relies on the Euler polyhedral formula 
relating the number of edges, faces, and vertices of a polyhedron. 
Ref. [24] has a well-written introduction to planar graphs. 

Planar graphs were studied in relation to the four color theo- 
rem: Any geopolitical map can be colored with at most four col- 
ors in a way that no adjacent states share a color. This problem 
was first introduced by Cayley to the London Mathematical Society 
[72]. The British mathematician Alfred Kempe wrote a false proof, 
as shown by fellow mathematician Percy Heawood, who constructed 
a proof of the five color theorem in the same paper. The problem 
remained open until 1976, when the four color theorem was proved 
by Appel and Haken (see Ref. [73] for the announcement and dis- 
cussion by the authors). Interestingly, this was an early (perhaps the 
earliest) computer-aided proof in mathematics, which set the stage 
for later computer-assisted proof methods [74]. 

There has been a substantial amount of crossover between graph 
coloring and abstract algebra with the introduction of the chromatic 
polynomial by Birkhoff [75], which he used to study planar graph 
coloring. The work and its extensions are summarized in Ref. [76], 
which also discusses the generalizations of Whitney [77]. The chro- 
matic polynomial Pg(k) can be used to count the number of proper 
k-colorings of a graph G and is generalized by the Tutte polyno- 
mial. (See Ref. [78] for a survey of graph polynomials by Tutte.) The 
chromatic polynomial and other graph polynomials are a major com- 
ponent of algebraic graph theory, which we discuss in the following 
chapters. 

In addition to vertex colorings, one can also study the edge color- 
ings of graphs. Gross and Yellen [19] and Diestel [24] have authored 
chapters on graph colorings that discuss edge coloring. Diestel [24] 
also proves the five color theorem. 
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7.6 Exercises 


Exercise 7.1 
Prove Proposition 7.7. 


Exercise 7.2 
Prove Theorem 7.10. 


Exercise 7.3 
Use the greedy coloring heuristic described in the proof of Theo- 
rem 7.11 to find a coloring for the Petersen graph. 


Exercise 7.4 
Use the algorithm described in the proof of Theorem 7.19 to compute 
a coloring for the Petersen graph. 


Exercise 7.5 
What is the chromatic number of the following graph? 


Exercise 7.6 
Verify whether the L-assignment M(x,) = T and M(x2) = M(ax3) = 
T satisfies x1 \ (722 V x3). 


Exercise 7.7 
Construct an example of a set of L-formulae (on any set of atoms 
that you like) that is not satisfiable. 


Exercise 7.8 
Determine whether S in Example 7.34 is satisfiable. Illustrate with 
a specific L-assignment. 
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Exercise 7.9 

A degree-k-constrained spanning tree of a graph G is a spanning tree 
T of G in which each vertex in T has a degree of at most k. Prove that 
the question: “Does a graph have a degree-k-constrained minimum 
spanning tree?” is NP-complete. To do this, you may assume the 
question “Does a graph have a Hamiltonian path?” is NP-complete. 
[Hint: Think about the case when k = 2.] 


Exercise 7.10 

Let G be a graph. Show that if G is k-colorable and we construct G’ 
from G by adding one edge, then G’ is at most k + 1-colorable. Show 
an example where this bound is strict. 
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Some Algebraic Graph Theory 
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Chapter 8 


Algebraic Graph Theory with 
Abstract Algebra 


Remark 8.1 (Chapter goals). In this chapter, we explore graph 
properties using concepts from abstract algebra. We begin by dis- 
cussing graph isomorphism. We then introduce basic concepts from 
group theory, including permutation groups. We then use these 
concepts to discuss graph automorphisms. This forms the first 
component of our exploration of algebraic graph theory. 


8.1 Isomorphism and Automorphism 


Definition 8.2 (Injective mapping). Let S and T be sets. A 
function f : S —+ T is injective (sometimes one-to-one) if for all 
81,59 © S: f(s1) = f(s2) — > $1 = 82. 


Definition 8.3 (Surjective mapping). Let S and T be sets. A 
function f : S > T is surjective (sometimes onto) if for all t € T, 
there exists an s € S such that f(s) =t. 


Definition 8.4 (Bijective mapping). Let bijective if f is both 
injective and surjective. 


Remark 8.5. An injection, a surjection, and a bijection are illus- 
trated in Fig. 8.1. 
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Fig. 8.1 An injection (left), a surjection (middle), and a bijection (right) are 
illustrated visually. 


8.2 Graph Isomorphism 


Definition 8.6 (Graph isomorphism). Let G = (V, FE) and G’ = 
(V’, E’). The graphs G and G’ are isomorphic if there is a bijective 
mapping f : V > V’ such that for all v1,v2 € V, we have 


{v1, v2} € BE = {f (v1), f(v2)} € E. (8.1) 


In this case, the mapping f is called a graph isomorphism. If G and 
G’ are isomorphic, we write G & G’. 


Definition 8.7. Let G = (V, E) bea graph. Then, the set {H : H = 
G} is called the isomorphism type (or isomorphism class) of G. 


Theorem 8.8 (Graph invariant theorem). Suppose G = (V, E) 
and G! = (V’, E’) are graphs, with G = G" and f : V > V' the graph 
isomorphism between the graphs. Further suppose that the degree 
sequence of G is d and the degree sequence of G’ is d’. Then: 


(1) |[V| =|V"| and |E| = |E"); 
(2) for allu eV, deg(v) = deg(f(v)); 
(3) d=d’; 

(4) for allu € V, ecc(v) = ecc(f(v)); 
(5) 

(6) 


w(G) = w(G ') (recall that w(G) is the clique number of G); 
a(G) = a(G’) (recall that a(G) is the independence number of 
G); 

(7) c(G) = c(G’) (recall that c(G) is the number of components of 
G); 


(8) diam(G) = diam(G’); 

(9) rad(G) = rad(G’); 

10) the girth of G is equal to the girth of G'; 

11) the circumference of G is equal to the circumference of G’. 
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Remark 8.9. The proof of Theorem 8.8 is long and should be clear 
from the definition of isomorphism. Isomorphism is really just a way 
of renaming vertices; we assume that the vertices in the graph G are 
named from the set V, while the vertices in the graph G’ are named 
from the set V’. If the graphs are identical except for the names we 
give the vertices (and thus the names that appear in the edges), then 
the graphs are isomorphic, and all structural properties are preserved 
as a result of this. 


Remark 8.10. The inverse of Theorem 8.8 does not hold. We illus- 
trate this in Example 8.11. 


Example 8.11. Given two graphs G and G", we can see by example 
that the degree sequence does not uniquely specify the graph G, and 
thus, if G and G’ have degree sequences d and d’, respectively, it is 
necessary that d = d’ when G & G’ but not sufficient to establish 
isomorphism. To see this, consider the graphs shown in Fig. 8.2. 
It’s clear that d = (2,2,2,2,2,2) = d’, but these graphs cannot be 
isomorphic since they have different numbers of components. 

The same is true with the other graph properties. The equality 
between a property of G and that same property for G’ is a necessary 
but insufficient criterion for the isomorphism of G and G’. We will 
not encounter any property of a graph that provides such a necessary 
and sufficient condition (see Remark 8.15). 


Fig. 8.2 Two graphs that have identical degree sequences but are not isomorphic. 
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Theorem 8.12. Suppose that G = (V,E) and G’ = (V',E’) are 
graphs with G & G' and that f : V + V' is the graph isomorphism 
between the graphs. If H is a subgraph of G, then H' = f(H) is a 
subgraph of G’. (Here, f(H) is the image of the subgraph H under 
the isomorphism f.) 


Definition 8.13 (Graph isomorphism problem). Given two 
graphs G = (V, FE) and G’ = (V’, E’), the graph isomorphism problem 
is to determine whether or not G and G" are isomorphic. 


Definition 8.14 (Subgraph isomorphism). Given two graphs 
G = (V,E) and H = (V’,E’), the subgraph isomorphism problem 
is to determine whether G contains a subgraph that is isomorphic 
to H. 


Remark 8.15. In general, the subgraph isomorphism problem 
is NP-complete. The graph isomorphism problem (interestingly 
enough) is a bit of an enigma. We do not know exactly how hard 
this problem is to solve. We do know that it is not quite as hard as 
the subgraph isomorphism problem. It is worthwhile noting, however, 
that there is a linear time algorithm for determining the isomorphism 
of two trees (see p. 84 of Ref. [79]). 


Definition 8.16 (Automorphism). Let G = (V,E) be a graph. 
An automorphism is an isomorphism from G to itself. That is, a 
bijection f : V > V so that for all v1,v2 € V, {v1,vo} € F< 


{f (v1), f(ve)} € E. 


Remark 8.17 (Inverse automorphism). Recall that an isomor- 
phism (and hence an automorphism) is a bijective function, and 
hence, it has a well-defined inverse. That is, if G = (V, FE) is a graph 
and f : V > V is an automorphism, then if f(v1) = f(v2), we know 
that vj = v2 (because f is injective). Furthermore, we know that for 
every v2 € V, there is a (unique) v1 € V so that f(v1) = ve (because 
f is surjective). Thus, if ve € V, we can define f~!(v2) to be the 
unique v1 so that f(v1) = ve. 


Lemma 8.18. Let G = (V, EF) be a graph. Suppose that f: V > V 
is an automorphism. Then, f~':V — V is also an automorphism. 
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Proof. The fact that f is a bijection implies that f—! is itself a 
bijection. We know for all vy and v2 in V that 


{vi, v2} € E = {f (v1), f(va)} € E. 


For every vertex pair u, and ug in V, there are unique vertices v1 
and v2 in V so that u; = f(v1) and ug = f(v2). Furthermore, by the 
previous observation, 


{uz, u2} CE = {v1, v2} ECE. 
However, this means that for all wu; and u2 in V, we have 
{f-*(u1), f-(u2)} € EB — {uy,uU2} € E. (8.2) 


Thus, f~! is a bijection that preserves the edge relation. This com- 
pletes the proof. 


Remark 8.19. We use the next lemma in the following section. 
Let f : V ~ V and g: V — V be automorphisms of a graph 
G. Then, fog: V — V is the function with the property that 
(f og)(v) = f(g(v)). This is just function composition. 


Lemma 8.20 (Composition). Let G = (V,E) be a graph. Sup- 
pose that f : V — V and g: V > V are automorphisms. Then, fog 
is also an automorphism. 


8.3. Groups 


Definition 8.21 (Group). A group is a pair (S,o), where S is a 
set ando: S x S — Sis a binary operation so that: 


(1) The binary operation o is associative; that is, if 51, s2, and s3 
are in S, then (s1 0 s2) 0 83 = $1 0 (S20 $3). 

(2) There is a unique identity element e € S so that for all s € S, 
6os=]s0e = s%, 

(3) For every element s € S, there is an inverse element s~! € S' so 
that gos-'=s"1os =e. 


If o is commutative, i.e., for all 51,59 € S, we have s1 0 59 = 820 5}, 
then ($0) is called a commutative group (or abelian group). 
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Definition 8.22 (Subgroup). Let (S,o) bea group. A subgroup of 
(S,o) is a group (T,°) so that T C S. The subgroup (7,0) shares the 
identity of the group (S,0). 


Example 8.23. The integers under addition (Z,+) is a group. We 
can see this because the sum of two integers is an integer. The addi- 
tive identity is the number 0 because 0-+-n = n+0 = n. The additive 
inverse of a number n € Z is —n € Z. Addition is associative. There- 
fore, (Z,+) is a group. 


Example 8.24. Consider the group (Z, +). If 2Z is the set of even 
integers, then (2Z,+) is a subgroup of (Z,+) because the even inte- 
gers are closed under addition. 


Remark 8.25. Recall from Lemma, 8.20 that the set of automor- 
phisms of a graph G is closed under function composition o. 


Theorem 8.26. Let G = (V,E) be a graph. Let Aut(G) be the set 
of all automorphisms on G. Then, (Aut(G),0) is a group. 


Proof. By Lemma 8.20, we can see that functional composition is a 
binary operation o : Aut(G) + Aut(G). Associativity is a property 
of functional composition since if f : V > V,g:V — V, and 
h:V + V, it is easy to see that for all ve V 


(fog) oh)(v) = (fog)(h(w)) = F(g(h(v))) = fe (g(A())) 
= (fo (goh))(v). (8.3) 
The identity function e : V > V defined by e(v) = v for all v E V 


is an automorphism of V. Finally, by Lemma 8.18, each element of 
Aut(G) has an inverse. This completes the proof. 


8.4 Permutation Groups and Graph Automorphisms 


Definition 8.27 (Permutation/Permutation group). A per- 
mutation on a set V = {1,...,n} of n elements is a bijective map- 
ping f from V to itself. A permutation group on a set V is a set of 
permutations with the binary operation of functional composition. 
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Remark 8.28. A graph automorphism is just a permutation of 
the vertices. Consequently, when we are studying the automorphism 
group of a graph, we are just studying a permutation group. 


Example 8.29. Consider the set V = {1,2,3,4}. A permutation 
on this set that maps 1 to 2, 2 to 3, and 3 to 1 can be written as 
(1, 2,3)(4), indicating the cyclic behavior that 1 — 2 > 3 > 1 with 4 
fixed. In general, we write (1, 2,3) instead of (1,2,3)(4) and suppress 
any elements that do not move under the permutation. 


Example 8.30. Consider the set V = {1,2,3}. The symmetric 
group on V is the set $3, and it contains the permutations: 


(1)(2)(3) =e (the identity), 


Example 8.31. For the permutation (f), taking 1 to 3, 3 to 1, 2 to 
4, and 4 to 2, we write f = (1,3)(2,4) and say that this is the product 
of (1,3) and (2,4). When determining the action of a permutation 
on a number, we read the permutation from right to left. Thus, if 
we want to determine the action of f on 2 (i.e., to compute f(2)), 
we read from right to left and see that 2 goes to 4. By contrast, if we 
had the permutation g = (1,3)(1,2), then to compute g(2), we see 
that we take 2 to 1 first and then 1 to 3; thus, 2 would be mapped 
to 3. We would first map the number 1 to 2 and then stop. The 
number 3 would be mapped to 1. Thus, we can see that (1,3)(1, 2) 
has the same action as the permutation (1, 2,3). 


Definition 8.32 (Transposition). A permutation of the form 
(a1, 2) is called a transposition. 


Definition 8.33 (Symmetric group). Consider a set V with n 
elements in it. The permutation group $;, contains every possible 
permutation of the set with n elements. 


Proposition 8.34. For each n, |S;,| =n! 
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Theorem 8.35. Every permutation can be expressed as a product of 
transpositions. 


Proof. Consider the permutation (a1, a2,...,@n). We may write 
(04; @a,+++50n) = (41; On) (G1, On—1) *** (G1; 02). (8.4) 


Observe the effect of these two permutations on a;. For i # 1 and 
i #n, reading from right to left (as the permutation is applied), we 
see that a; maps to a, which, reading further from right to left, is 
mapped to aj41, as we would expect. If i = 1, then a, maps to ag, 
and there is no further mapping. Finally, if 7 = n, then we read left 
to right to the only transposition containing a, and see that a, maps 
to a1. Thus, Eq. (8.4) holds. This completes the proof. 


Remark 8.36. The following theorem is useful for our work on 
matrices in the second part of this chapter, but its proof is outside 
the scope of these notes. The interested reader can see Chapter 2.2 
of Ref. [80]. 


Theorem 8.37. No permutation can be expressed as both a product 
of an even and an odd number of transpositions. 


Definition 8.38 (Even/Odd permutation). Let o € S, be a 
permutation. If o can be expressed as an even number of trans- 
positions, then it is even, otherwise o is odd. The signature of the 
permutation is 


1 o is even. 


—1 o is odd, 
sen(o) = (8.5) 


Remark 8.39. Let G = (V, E) be a graph. If f € Aut(G), then f is 
a permutation on the vertices of G. Thus, the graph automorphism 
group is just a permutation group that respects vertex adjacency. 


Example 8.40. Consider the graph K3, the complete graph on three 
vertices (see Fig. 8.3(a)). The graph 3 has six automorphisms, one 
for each element in S53, the set of all permutations on three objects. 
These automorphisms are: (i) the identity automorphism that maps 
all vertices to themselves, which is the identity permutation e; (ii) the 
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Counter-clockwise 


. Clockwise 
Rotation Rotation 
(1, 3)(2) (1, 2)(3) 


(2,3)(1) 


Fig. 8.3 The graph K3 has six automorphisms, one for each element in 53, the 
set of all permutations on three objects. These automorphisms are: (i) the identity 
automorphism that maps all vertices to themselves; (ii) the automorphism that 
exchanges vertices 1 and 2; (iii) the automorphism that exchanges vertices 1 and 
3; (iv) the automorphism that exchanges vertices 2 and 3; (v) the automorphism 
that sends 1 to 2, 2 to 3, and 3 to 1; and (vi) the automorphism that sends 1 to 
3, 3 to 2, and 2 to 1. 


automorphism that exchanges vertices 1 and 2, which is the per- 
mutation (1,2); (iii) the automorphism that exchanges vertices 1 
and 3, which is the permutation (1,3); (iv) the automorphism that 
exchanges vertices 2 and 3, which is the permutation (2,3); (v) the 
automorphism that sends 1 to 2, 2 to 3, and 3 to 1, which is the 
permutation (1,2,3); and (vi) the automorphism that sends 1 to 3, 
3 to 2, and 2 to 1, which is the permutation (1,3, 2). 

Note that each of these automorphisms is illustrated by a symme- 
try in the graphical representation of A3. The permutations (1, 2), 
(1,3), and (2,3) are flips about an axis of symmetry, while the per- 
mutations (1,2,3) and (1,3,2) are rotations. This is illustrated in 
Fig. 8.3. 

It should be noted that this method of drawing a graph to find its 
automorphism group does not work in general, but for some graphs 
(such as complete graphs or cycle graphs), this can be useful. 


Lemma _ 8.41. The automorphism group of Ky is Sy, thus 
|Aut(K,,)| =n! 


Definition 8.42 (Star graph). A star graph on n+ 1 vertices 
(unfortunately denoted S,,) is a graph with vertex set V = 
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{vo,.--,Un} and edge set FE so that 
ecH = e={vo,u;} 1€ {1,...,n}. 
Thus, the graph S$, has n + 1 vertices and n edges. 


Remark 8.43. It is unfortunate that the symmetric group on n 
items and star graph with n+1 vertices have the same representation. 
We differentiate between the two explicitly to prevent conclusion. It 
is also worth noting that some references define the star graph S,, to 
have n vertices and n — 1 edges. 


Example 8.44. The star graph $3 with four vertices and three edges 
is shown in Fig. 8.4, as is the graph So. 


Remark 8.45. We end this chapter with a simple proposition show- 
ing that it is not only complete graphs that can have large automor- 
phism groups. 


Proposition 8.46. The automorphism group of the star graph Sj, 
has n! elements. 


oO 


(a) Ss (b) Sy 


Fig. 8.4 The star graphs S3 and Sp. 
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8.5 Chapter Notes 


In this chapter, we have shown that any graph automorphism is just a 
permutation on the vertices. Consequently, the identity permutation 
(mapping vertex v to itself for all v) is always an automorphism of a 
graph. This is called the trivial automorphism. Determining whether 
a graph has more automorphisms is called the graph automorphism 
problem. Just as the graph isomorphism problem is of unknown com- 
plexity, it is also unknown whether the graph automorphism problem 
is NP-complete [81]. An automorphism f : V > V has a fixed point 
if there is av € V so that f(v) = v. The question of determining 
whether a graph has an automorphism with no fixed points is known 
to be NP-complete [81]. The idea of using visual symmetries to find 
graph automorphisms (as we did for C3) can be extended [82, 83] 
to larger graphs, and there are computer programs that try to do 
this. 

It turns out that there is a deeper relationship between graphs and 
groups. In 1936, K6nig conjectured [84] that every finite group is the 
automorphism group of some graph. This was proved by Frucht [85] 
in 1939. It is now known as Frucht’s theorem. The proof approach 
uses a Cayley graph analysis. Cayley graphs (named after Arthur 
Cayley) encode information about a group’s multiplication table. To 
fully understand them, we need the idea of a generating set of a 
group. Suppose (S,0) is a group. A generating set H C S is a subset 
of S whose elements and their inverses can be used to reconstruct 
the entire group. For example, {1} generates the group of integers 
under addition because n = 1+1+---+1, while —n = (—1)+(-1)+ 

-- + (—1), where each sum has n terms. Similarly, the rotations we 
discussed when building the automorphisms of C3 are generated by 
a single 27/3 radian rotation (which is then applied to itself over 
and over). The automorphism group of every cycle is generated by a 
two-element set, and this group is isomorphic to the symmetry group 
of the regular polygons (called the dihedral group). To construct a 
Cayley graph: 


(1) assign each element in S to a vertex; 

(2) assign each element of h € H a color cp; 

(3) for each g € G and h ©€ H, add the directed edge (g, gh) with 
color cp. 
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Fig. 8.5 The Cayley graph of Aut(C.), which has a generating set of size 2 and 
eight elements. 


The Cayley graph for the group of automorphisms of C4 is shown 
in Fig. 8.5. Algebraic concepts also enter into algebraic graph the- 
ory through the various graph polynomials that we discussed at the 
end of Chapter 7. Polynomials (and their roots) have a long and 
deep connection with group theory. Readers can consult Ref. [80] for 
details. 


8.6 Exercises 


Exercise 8.1 

Prove that graph isomorphism is an equivalence relation. [Hint: 
Recall that an equivalence relation is a binary relation ~ defined 
on a set S so that (i) for all s € S, s ~ s (reflexiveness); (ii) for all 
s,te€S,svt <— > t~s (symmetry); and (iii) for all r,s,¢t € T, 
r~ sand s~t implies r ~ t (transitivity). Here, the set is the set 
of all graphs.] 


Exercise 8.2 

Prove Theorem 8.12. [Hint: The proof does not have to be extensive 
in detail. Simply write enough to convince yourself that the isomor- 
phisms preserve the subgraph property.] 
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Exercise 8.3 

List some ways to determine that two graphs are not isomorphic. 
That is, what are some tests one might do to see whether two graphs 
are not isomorphic? 


Exercise 8.4 
Prove carefully that if f is a bijection, then so is f~!. [Hint: Most of 
the proof is in Remark 8.17.] 


Exercise 8.5 
Prove Lemma 8.20. 


Exercise 8.6 

Let (5,0) be a group with identity e. Prove that the set {e} with o 
is also a group called the trivial subgroup of (S,0). Conclude that 
for graphs with only the identity automorphism, their automorphism 
group is trivial. 


Exercise 8.7 
Prove Proposition 8.34. 


Exercise 8.8 

Find the automorphism group of the cycle graph C4. Can you find 
the automorphism group for Cy, k > 3? How many elements does it 
have? 


Exercise 8.9 
Prove Lemma 8.41. 


Exercise 8.10 

Show that the automorphism group of the star graph S3 is also 
identical to the symmetric permutation group $3. As a result, show 
that two non-isomorphic graphs can share an automorphism group. 
(Remember that Aut(/c3) is also the symmetric permutation group 
on three elements. ) 


Exercise 8.11 
Prove Proposition 8.46. 
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Exercise 8.12 

(Project) Study the problem of graph automorphism in detail. 
Explore the computational complexity of determining the automor- 
phism group of a graph or a family of graphs. Explore any automor- 
phism groups for specific types of graphs, such as cycle graphs, star 
graphs, and hypercubes. 


Chapter 9 


Algebraic Graph Theory 
with Linear Algebra 


Remark 9.1 (Chapter goals). Our goal in this chapter is to dis- 
cuss algebraic graph theory from the perspective of linear algebra. 
We assume that the reader is familiar with matrix operations, eigen- 
values, and eigenvectors. A review of these topics is contained in 
Appendix A. In this chapter, we discuss the various matrices one 
can construct from a graph. We study some of the properties of the 
spectra (eigenvalues) of these matrices. We conclude by introducing 
the Perron—Frobenius theorem, which we use in the following chapter 
to discuss applications of algebraic graph theory. 


9.1 Matrix Representations of Graphs 


Definition 9.2 (Adjacency matrix). Let G = (V,E) be a graph 
and assume that V = {v1,...,Un}. The adjacency matriz of G is an 
n xX n matrix M defined as 


M.. = 1 {u;,0;) € B, 
4 0 otherwise. 


Proposition 9.3. The adjacency matrix of a (simple) graph is sym- 
metric. 


Theorem 9.4. Let G = (V,E) be a graph with V = {v1,...,Un}, 
and let M be its adjacency matrix. For k > 0, the (i,j) entry of M* 
is the number of walks of length k from v; to v;. 


159 


160 Applied Graph Theory 


Proof. We proceed by induction. By definition, M° is the n x n 
identity matrix, and the number of walks of length 0 between v; and 
v; is 0 ifi A 7 and 1 otherwise; thus, the base case is established. 

Now, suppose that the (i, 7)th entry of M* is the number of walks 
of length k from v; to v;. We show that this is true for k + 1. We 
know that 


M*t! — M*M. (9.1) 


Consider vertices v; and v;. The (i, 7)th element of M**? is 


Mit! — (Mr) M.;. (9.2) 
Let 
NE = ints Mul (9.3) 
where r;, /=1,...,, is the number of walks of length k from v; to 
uy, by the induction hypothesis. Let 
by 
M,; =|: |, (9.4) 
bn 


where b;,/ = 1,...,n, isa 1 if and only if there is an edge {w, vj} € E 
and 0 otherwise. Then, the (i,7)*® term of M**+! is 
n 
MEtT = MEM.; = — iby. (9.5) 
l=1 
This is the total number of walks of length k leading to a vertex vw, 
1=1,...,n, from vertex v; such that there is also an edge connecting 
vu, to v;. Thus, Mitt is the number of walks of length k + 1 from v; 
to v;. The result follows by induction. 


Example 9.5. Consider the graph in Fig. 9.1. The adjacency matrix 
for this graph is 


(9.6) 


ae) 
- oo 
= oo 
OrRrRrH 
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Fig. 9.1 The adjacency matrix of a graph with n vertices is an n x n matrix with 
a 1 at element (i,7) if and only if there is an edge connecting vertex 7 to vertex 
j; otherwise, element (i, 7) is a zero. 


Consider M?: 


M? = (9.7) 


Nr Fr WwW 
PNNHR 
PNNH 
wre F be 


This tells us that there are three distinct walks of length 2 from 
vertex v; to itself. These walks are 


(1) (v1, {u1, va}, v2, (v1, va}, v1), 
(2) (v1, {v1, va}, v3, {v1, v3}, v1), and 
(3) (v1, {u1, va}, va, (U1, Va}, V1). 


We also see that there is one path of length 2 from v; to va: 


(v1, {v1, va}, va, {v2, v4}, 2). We can verify each of the other num- 
bers of paths in M?. 


Definition 9.6 (Directed adjacency matrix). Let G = (V,£) 
be a directed graph, and assume that V = {v ,..., Un}. The adja- 
cency matriz of G is an n x n matrix M defined as 


M.. = 1 (03,09) 2, 
7 )0 otherwise. 


Theorem 9.7. Let G = (V,E) be a digraph with V = {v1,...,Un}, 
and let M be its adjacency matrix. For k > 0, the (i,7) entry of M* 
is the number of directed walks of length k from v; to v;. 
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Definition 9.8 (Incidence matrix). Let G = (V,E) be a graph 
with V = {v1,...,Um} and FE = {e1,...,en}. Then, the incidence 
matriz of G is an m X n matrix A with 


0 if v; is not in e;, 
Ajj = 41. if u; is in ej and e; is not a self-loop, (9.8) 


2 if v; is in e; and e; is a self-loop. 


Theorem 9.9. Let G = (V,E) be a graph with V = {v1,...,Um} 
and FE = {e1,...,€n} with incidence matrix A. The sum of every 
column in A is 2, and the sum of each row in A is the degree of the 
vertex corresponding to that row. 


Proof. Consider any column in A; it corresponds to an edge e of 
G. If the edge is a self-loop, there is only one vertex adjacent to e 
and thus only one nonzero entry in this column. Therefore, its sum is 
2. Conversely, if e connects two vertices, then there are precisely two 
vertices adjacent to e and thus two entries in this column that are 
nonzero both with value 1; thus, again, the sum of the column is 2. 

Now, consider any row in A; it corresponds to a vertex v of G. 
The entries in this row are 1 if there is some edge that is adjacent 
to v and 2 if there is a self-loop at v. From Definition 1.13, we see 
that adding these values up yields the degree of the vertex v. This 
completes the proof. 


Definition 9.10 (Directed incidence matrix). Let G = (V, £) 
be a digraph with V = {v1,...,Um} and E = {e1,...,e,}. Then, the 
incidence matrix of G is an m x n matrix A with 


0 if v; is not in e,;, 
1 if v; is the source of e; and e; is not a self-loop, 
—1 if v; is the destination of e; and e; is not a self-loop, 
2 if v; is in e; and e; is a self-loop. 
(9.9) 


Remark 9.11. The adjacency matrices of simple directed graphs 
(those with no self-loops) have very useful properties, which are dis- 
cussed in Chapter 12. In particular, these matrices have the property 
that every square sub-matrix has a determinant that is either 1, —1, 
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or 0. This property is called total unimodularity, and it is particularly 
important in the analysis of network flows. 


9.2 Properties of the Eigenvalues of 
the Adjacency Matrix 


Remark 9.12. Going forward, we assume that the reader is familiar 
with basic facts about matrix eigenvalues. This information can be 
found in Appendix A. 


Lemma 9.13 (Rational root theorem). Let a,2" +-+--+ aja + 
ao = 0 for x = p/q with gcd(p,q) = 1 and ay,..., a0 € Z. This is p/q 
is a rational root of the equation. Then, p is an integer factor of ao 
and q is an integer factor of ay. 


Remark 9.14. The following theorem follows from the spectral the- 
orem for real symmetric matrices (see Theorem A.97) and the ratio- 
nal root theorem. 


Theorem 9.15 (Graph spectrum). Let G = (V,E) be a graph 
with adjacency matriz M. The following hold: 


(1) Every eigenvalue of M is real. 
(2) If A is a rational eigenvalue of M, then it is an integer. 


Remark 9.16. Two graphs that are not isomorphic can have the 
same set of eigenvalues. This can be illustrated through an example 
that can be found in Chapter 8 of Ref. [39]. The graphs are shown 
in Fig. 9.2. We can see that the two graphs are not isomorphic since 
there is no vertex in Graph G that has a degree of 6 unlike Vertex 7 
of graph G2. However, one can determine (using a computer) that 
their adjacency matrices share the same set of eigenvalues. 


Definition 9.17 (Irreducible matrix). A matrix M € R”*” is 
irreducible if for each pair (7,7), there is some k € Z with k > 0 so 
that MF, > 0. 


Lemma 9.18. If G = (V,E) is a connected graph with adjacency 
matriz M, then M is irreducible. 


164 Applied Graph Theory 


(a) Gi (b) G2 


Fig. 9.2 Two graphs with the same eigenvalues that are not isomorphic are 
illustrated. 


Theorem 9.19 (Perron—Frobenius theorem). Jf M is an irre- 
ducible matrix, then M has an eigenvalue of Ao with the following 
properties: 


(1) The eigenvalue Xo is positive, and if is an alternative eigenvalue 
of M, then Xo > |AI. 

(2) The matrix M has an eigenvector vo corresponding to Ao with 
only positive entries when properly scaled. 

(3) The eigenvalue Xo is a simple root of the characteristic equation 
for M and, therefore, has a unique (up to scale) eigenvector vo. 

(4) The eigenvector vo is the only eigenvector of M that can have 
all positive entries when properly scaled. 


Corollary 9.20. If G = (V,E) is a connected graph with the adja- 
cency matrix M, then it has a unique largest eigenvalue that corre- 
sponds to an eigenvector that is positive when properly scaled. 


Proof. Applying Lemma 9.18, we see that M is irreducible. Fur- 
thermore, we know that there is an eigenvalue \9 of M that is (i) 
greater than or equal to, in absolute value, all other eigenvalues of M 
and (ii) a simple root. From Theorem 9.15, we know that all eigen- 
values of M are real. But for (i) and (ii) to hold, no other (real) 
eigenvalue can have value equal to Ao (otherwise, it would not be a 
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simple root). Thus, Xo is the unique largest eigenvalue of M. This 
completes the proof. 


9.3. Chapter Notes 


Matrices (in the form used in this chapter) are a relatively recent 
invention, though arrays of numbers have been used for centuries. 
The term “matrix” was first used by J. J. Sylvester in Ref. [86]. At 
this time, matrices were mainly used as generators of determinants. 
Cayley (of Cayley graphs from the notes in Chapter 8) used matrices 
in his work on geometric transformations [87] and proved the basic 
algebraic properties of matrix arithmetic. He later wrote a treatise 
on matrices [88]. According to Ref. [87], Cullis was the first to use 
the square-bracket notation in 1913 [89]. 

The adjacency matrix of a graph was known and used at least 
by Pélya and Szegé [90] in 1925. This fact is mentioned by Harary 
[91] in his 1962 paper on determinants of adjacency matrices. In 
particular, Harary states that Pdlya suggested the paper to him. 
The question of non-isomorphic graphs that share spectra (adjacency 
matrix eigenvalues) is one considered in Ref. [91]. Questions on the 
properties of the spectra of graphs are also studied in Refs. [92,93]. 
The incidence matrix, which we did not discuss in detail, returns in 
Chapter 12, where it appears as the constraint matrix of the linear 
programming problem that arises from network flow problems. 

Oskar Perron was not a graph theorist. He was a professor of 
mathematics at the University of Heidelberg, where he studied ordi- 
nary and partial differential equations. Likewise, Frobenius was a 
mathematician who studied differential equations (as well as num- 
ber and group theory). The Perron—Frobenius theorem was proved 
independently by the two mathematicians. This result has found 
applications ranging from graph theory (see the following chapter) 
to economics [94] and even ranking football (soccer) teams [95,96]. 

The Perron—Frobenius theorem is a classical result in linear alge- 
bra with several proofs (see Ref. [97]). Meyer says the following about 
the theorem: 


In addition to saying something useful, the Perron—Frobenius 
theory is elegant. It is a testament to the fact that beautiful 
mathematics eventually tends to be useful, and useful mathe- 
matics eventually tends to be beautiful. 
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One should note that you can say more than we state in our pre- 
sentation of the theorem. See Chapter 8 of Ref. [97] for details and 
a proof. In the next chapter, we discuss several applications of the 
Perron—Frobenius theorem arising from graphs. 


9.4 Exercises 


Exercise 9.1 
Find the adjacency matrix for the graph Cy. Can you describe the 
adjacency matrix for an arbitrary cycle C;,? 


Exercise 9.2 
Prove Proposition 9.3. 


Exercise 9.3 

Devise an inefficient test for isomorphism between two graphs G and 
G’ using their adjacency matrix representations. Assume that it takes 
1 time unit to test whether two n x n matrices are equal. What is 
the maximum amount of time your algorithm takes to determine that 
G # G’? [Hint: Continue to reorder the vertices of G’ and test the 
adjacency matrices for equality.| 


Exercise 9.4 
Prove Theorem 9.7. [Hint: Use the approach in the proof of Theo- 
rem 9.4.] 


Exercise 9.5 
Use Theorem 9.9 to prove Theorem 2.10 a new way. 


Exercise 9.6 

(Project) Prove the spectral theorem for real symmetric matrices 
and then use it to obtain Part 1 of Theorem 9.15. Then, prove and 
apply Lemma 9.13 to prove Part 2 of Theorem 9.15. You should 
discuss the proof of the spectral theorem for real symmetric matrices. 
{Hint: All these proofs are available in references or online; expand 
on these sources in your own words. 


Exercise 9.7 
Use a computer to show that the two graphs in Remark 9.16 share 
the same set of eigenvalues. 
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Exercise 9.8 
Prove Lemma 9.18. 


Exercise 9.9 

Find the principal eigenvector/eigenvalue pair of the adjacency 
matrix for $4, the star graph with five vertices. By principal 
eigenvector /eigenvalue pair, we mean the eigenvalue with the largest 
positive value and its corresponding eigenvector. 
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Chapter 10 


Applications of Algebraic 
Graph Theory 


Remark 10.1 (Chapter goals). In this chapter, we explore appli- 
cations of algebraic graph theory. We study additional network cen- 
trality measures, explore Markov chains (a practical application of 
directed graphs), and study spectral clustering of graphs. The first 
two topics are motivated by the Perron—Frobenius theorem, while 
the last topic is a direct application of the graph Laplacian matrix. 


10.1 Eigenvector Centrality 


Remark 10.2. The following approach to deriving eigenvector cen- 
trality comes from Leo Spizzirri [98]. What follows is a derivation, 
not a proof. 


Derivation 10.3 (Eigenvector centrality). We can assign to 
each vertex of a graph G = (V,E) a score (called its eigenvector 
centrality) that will determine its relative importance in the graph. 
Here, importance is measured in a self-referential way: Important 
vertices are important precisely because they are adjacent to other 
important vertices. (This is the high-school concept of “coolness by 
association.” ) This self-referential definition can be resolved in the 
following way. 

Let x; be the (unknown) score of vertex vj € V, and let x2; = 
K(v;), with « being the function returning the score of each vertex 
in V. Define x; as a pseudo-average of the scores of its neighbors. 
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That is, write 


n= S> x(v). (10.1) 


vEN (vu) 


Here, A will be chosen endogenously during computation. 

Recall that M,. is the ith row of the adjacency matrix M and 
contains a 1 in position j if and only if v; is adjacent to v;, ie., to 
say v; € N(v;). Thus, we can rewrite Eq. (10.1) as 


1 n 


This leads to n equations, one for each vertex in V (or each row 
of M). Written as a matrix expression, we have 


x= <Mx => Ax = Mx. (10.2) 


Thus, x is an eigenvector of M and 4 is its eigenvalue. 


Remark 10.4. Clearly, there may be several eigenvectors and eigen- 
values for M. The question is, which eigenvalue-eigenvector pair 
should be chosen? The answer is to choose the eigenvector with all 
positive entries corresponding to the largest eigenvalue. We know 
such an eigenvalue—eigenvector pair exists and is unique as a result 
of Lemma 9.18 and the Perron—Frobenius theorem (Theorem 9.19). 


Theorem 10.5. Let G = (V, E) be a connected graph with adjacency 
matriz M € R"*”". Suppose that Ao is the largest real eigenvalue of 
M and has the corresponding eigenvector vo. Furthermore, assume 
that |Ao| > |A| for any other eigenvalue \ of M. If x € R™! is a 
column vector so that x-vo #0, then 


= aovo. (10.3) 


Proof. Applying Theorem A.97, we see that the eigenvectors of M 
must form a basis for R”. Thus, we can express 


X = anVo + Q Vy +++: + An—1Vn_1- (10.4) 
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Multiplying both sides by M* yields 
M*‘x = agM*vo + ayM*v, fuer On—1M*vn_1 
= ao vo + ayARvy soe Ht Ci NVand (10.5) 


because M’v; = ARV; for any eigenvalue v;. Dividing by AE yields 


M*x Bi AP 
co aoVo + on EVI free On Vnt: (10.6) 
0 0 0 


Applying our assumption that 9 > |A| for all other eigenvalues A, 
we have 


Me 
lim = =0 (10.7) 
k-00 NG 
for 1 £ 0. Thus, 
MFx 
li = ’ 10.8 
ra 08) 


Remark 10.6. We can use Theorem 10.5 to justify our definition 
of eigenvector centrality as the Perron—Frobenius eigenvector of the 
adjacency matrix. Let x be a vector with a 1 at index 7 and 0 every- 
where else. We imagine this vector corresponds to a walker who starts 
at vertex vu; in the graph G. If M is the adjacency matrix, then M-x is 
the ith column of M, whose jth index tells us the number of walks of 
length 1 leading from vertex v; to vertex v;. We can repeat this logic 
to see that Mx gives us a vector whose jth element is the number 
of walks of length k from v; to uy. 

From Theorem 10.5, we know that (under some suitable condi- 
tions), no matter which vertex we choose in creating x, 


ky 
li = . 10. 
jim : aovo (10.9) 


Reinterpreting Eq. (10.9), we observe that as k — oo, the eigenvector 
centrality of vertex v; is just counting (a scaled version) of the number 
of long paths leading to that vertex. The more paths leading to a 
vertex, the more central it is and thus the higher it is ranked by 
eigenvector centrality. 
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Fig. 10.1 A matrix with four vertices and five edges. Intuitively, vertices 1 and 
4 should have the same eigenvector centrality score as vertices 2 and 3. 


Example 10.7. Consider the graph shown in Fig. 10.1. Recall from 
Example 9.5 that this graph has the adjacency matrix 


f el 
1001 
NES 100 1’ 
1110 
We can use a computer to determine the eigenvalues and eigenvectors 
of M. The eigenvalues are 


{2 (1+ ¥i7).3 (1 vi) 1.0}, 


while the corresponding eigenvectors are the rows of the matrix 


1 _ -3-V17 —3-V17 ‘| 


5+V 17 5+V 17 


1 _ 3-V17 _ 3-V17 1 
V17-5 V17-5 

—1 0 0 1 

0 —1 1 


The largest eigenvalue is Aj = $(1 + V17), which has the corre- 
sponding eigenvector 


_-3- Vv17 Balt 
em alg eal 


Eigenvector centrality is usually normalized, so the entries of the 
vector sum to one. This can be accomplished as 
vo 


a V0; , 


Vvo= 1 


Vo= 
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In this example, our normalized eigenvector centrality is given by the 
approximation 


vo © [0.28 0.22 0.22 0.28]. 


This illustrates more clearly that vertices 1 and 4 have identical 
(larger) eigenvector centrality scores and vertices 2 and 3 have iden- 
tical (smaller) eigenvector centrality scores. Let x = (1,0,0,0) and 
yr — M*x, and suppose that y, is normalized, so its components 
sum to one. Then, 


0 0.26 eal 
. _ |0.33} . 0.24] . _ | 0.22 
YI~ 1033) Y5™ |o24) %0™ | 0.22]: 
ee aa ae 


It’s easy to see that as k > oo, yz approaches Wo as expected. 


10.2} Markov Chains and Random Walks 


Remark 10.8. Appendix B provides an introduction to probabil- 
ity. While it is not necessary, it may make some of the following 
observations on Markov chains easier to understand for those with 
no background in probability theory. 


Definition 10.9 (Markov chain). A discrete-time Markov chain 
is a pair M = (G,p) where G = (V, £) is a directed graph and the 
set of vertices is usually called the set of states, the set of edges are 
called the transitions, and p: EF — [0,1] is a probability assignment 
function satisfying 


S| e@e)]=t (10.10) 


v'ENo(v) 


for all v € V. Here, N,(v) is the neighborhood reachable by an out- 
edge from v. If there is no edge (v, v’) € E, then p(v, v’) = 0. 


Remark 10.10. There are also continuous-time Markov chains, but 
we will not discuss those here. See Ref. [99] for information on those 
models. For the remainder of this chapter, when we say Markov chain, 
we mean discrete-time Markov chain. 
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e 
NID 


1 


7 


Fig. 10.2 A Markov chain is a directed graph to which we assign edge prob- 
abilities so that the sum of the probabilities of the out-edges at any vertex is 
always 1. 


Example 10.11. A simple Markov chain is shown in Fig. 10.2. There 
are two states (vertices) denoted by 1 and 2. The fractions next to 
the directed edges are their assigned probabilities. We can think of a 
Markov chain as governing the evolution of a state as follows. Think 
of the states as cities with airports. If there is an out-edge connecting 
the current city to another city, then we can fly from our current city 
to this next city, and we do so with some probability. When we do fly 
(or perhaps don’t fly and remain at the current location), our state 
updates to the next city. In this case, time is treated discretely. 

A walk along the vertices of a Markov chain governed by the 
probability function is called a random walk. Computing the proba- 
bility of taking a specific random walk is an exercise in conditional 
probability (see Appendix B). 


Definition 10.12 (Stochastic matrix). Let M = (G,p) be a 
Markov chain. Then, the stochastic matrix (or probability transition 
matrix) of M is 


May = p(2%, 09). (10.11) 


Example 10.13. The stochastic matrix for the Markov chain in 
Fig. 10.2 is 


Thus, a stochastic matrix is very much like an adjacency matrix 
where the 0’s and 1’s, indicating the presence and absence of an 
edge, respectively, are replaced by the probabilities associated with 
the edges in the Markov chain. 


I 
1 
NF ble 
NID wl 
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Definition 10.14 (State probability vector). If M = (G,p) isa 
Markov chain with n states (vertices), then a state probability vector 
is a vector x € R”™*! such that x; +29 +---+a@, =1 and 2; > 0 for 
i= 1,...,n and x; represents the probability that we are in state 7 
(at vertex i). 


Remark 10.15. The following theorem can be proved in exactly the 
same way that Theorem 9.4 is proved. 


Theorem 10.16. Let M = (G,p) be a Markov chain with n states 
(vertices). Let x © R"*! be an (initial) state probability vector. 
Then, assuming we take a random walk of length k in M using the 
initial state probability vector x), the final state probability vector is 


x) = (MT)" x, (10.12) 


Remark 10.17. This can be written without the transpose. Let 
x) € R!*”. that is, x is a row vector. Then, 


x) = xOmr, (10.13) 
with x) € RIX”, 


Example 10.18. Consider the Markov chain in Fig. 10.2. The state 


vector 
“-( 


states that we start in State 1 with probability 1. From Exam- 
ple 10.13, we know what M is. Then, it is easy to see that 


1 
(1) = EVE OY 2 
x = (M ) x = | ( 
2 
This is precisely the state probability vector we would expect after 


a random walk of length 1 in M. 


Definition 10.19 (Stationary probability vector). Let M = 
(G,p) be a Markov chain. Then, a vector x* is stationary for M if 


x” = Mi x. (10.14) 
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Remark 10.20. Equation (10.14) says that a probability distribu- 
tion on the states of a Markov chain is stationary if it doesn’t change 
as a result of taking a random step in a Markov chain. 


Remark 10.21. Equation (10.14) should look familiar. It says that 
M7” has an eigenvalue of 1 and a corresponding eigenvector whose 
entries are all non-negative (so that the vector can be scaled so its 
components sum to 1). Furthermore, this looks very similar to the 
equation we used for eigenvector centrality. 


Lemma 10.22. Let M = (G,p) be a Markov chain with n states 
and with stochastic matrix M. Then, 


My =1 (10.15) 
j 


for alli =1,...,n. 


Lemma 10.23. M = (G,p) be a Markov chain with n states and 
with the stochastic matrix M. If G is strongly connected, then M and 
M?® are irreducible. 


Proof. If G is strongly connected, then there is a directed walk 
from any vertex v; to any other vertex v; in V, the vertex set of G. 
Consider any length-k walk connecting v; to v; (such a walk exists 
for some k). Let e; be a vector with 1 in its ith component and 0 
everywhere else. Then, (M7) *e; is the final state probability vector 
associated with a walk of length k& starting at vertex v;. Since there 
is a walk of length k from v; to vj, we know that the jth element of 
this vector must be nonzero. That is, 


e; (M")*e; > 0, 
where e; is defined just as e; is but with 1 at the jth position. 
Thus, (M?)k. > 0 for some k for every (i,j) pair, and thus, M” is 


irreducible. The fact that M is irreducible follows immediately from 
the fact that (M7’)* = (M*)". This completes the proof. 


Theorem 10.24 (Perron—Frobenius theorem redux). If M is 
an irreducible matrix, then M has an eigenvalue Xo > 0 that satisfies 
all the properties in Theorem 9.19 and 


min M;; < Ao < max M,;;. 
i > 19 a AD ; a 19 
J J 
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Theorem 10.25. Let M = (G,p) be a Markov chain with the 
stochastic matrix M. If M" is irreducible, then M has a unique 
stationary probability distribution. 


Proof. From Theorem A.80, we know that M and M® have identi- 
cal eigenvalues. By the Perron—Frobenius theorem, M has the largest 
positive eigenvalue Ap that satisfies 


min) Mj < do < max > | M;;. 
j j 
By Lemma 10.22, we know that 
min y M;; = max D M;; = 1. 
j j 


Therefore, by the squeezing lemma, Aj = 1. The fact that M7 has 
exactly one strictly positive eigenvector vo corresponding to Aj = 1 
means that 


M’ vo = Vo. (10.16) 


Thus, vo is the unique stationary state probability vector for M = 
(G,p). This completes the proof. 


Remark 10.26. Let M be a Markov chain. If Ag = 1 has a strictly 
greater absolute value than all other eigenvalues of M’, then we 
can strengthen Theorem 10.25 to say that lim,_,,, M* converges to 
a rank 1 matrix whose rows are all the stationary distribution vec- 
tor. This is not true of all Markov chains (because of the eigenvalue 
requirement). Markov chains for which this property holds may be 
called ergodic; they are aperiodic, meaning that a state pattern does 
not repeat in a cycle and their directed graphs are strongly connected. 


10.3. PageRank 


Definition 10.27 (Induced Markov chain). Let G = (V,E) be 
a graph. Then, the Markov chain induced from G is the one obtained 
by defining a new directed graph G’ = (V,E’), with each edge 
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{v,v'} © E replaced by two directional edges (v,v’) and (v’,v) in 
F and defining the probability function p so that 


1 
p(v,v') = ————_. 
deg outs U 


(10.17) 


Example 10.28. An induced Markov chain is shown in Fig. 10.3. 
The Markov chain in the figure has the stationary state probability 
vector 


* 
= colt colt cow 


-— 


1 


which is the eigenvector corresponding to the eigenvalue 1 in the 
matrix M?. Arguing as we did in the proof of Theorem 10.5 and 
Example 10.7, we could expect that for any state vector x, we would 
have 

lim (m7)* x=x". 

k- 00 
We would be correct. When this convergence happens quickly (where 
we leave quickly poorly defined), the graph is said to have a fast 
mixing property. 


Original Graph Induced Markov Chain 


Fig. 10.3. An induced Markov chain is constructed from a graph by replacing 
every edge with a pair of directed edges (going in opposite directions) and assign- 
ing a probability equal to the out-degree of each vertex to every edge leaving that 
vertex. 


Applications of Algebraic Graph Theory 179 


If we used the stationary probability of a vertex in the induced 
Markov chain as a measure of importance, then, clearly, vertex 1 
would be the most important, followed by vertices 2 and 3 and, lastly, 
vertex 4. We can compare this with the eigenvector centrality mea- 
sure, which assigns a rank vector of 


oa 

0.270 

+ pv 

= = o70h 
0.145 


The eigenvector centrality gives the same ordinal ranking as using the 
stationary state probability vector, but there are subtle differences 
in the values produced by these two ranking schemes. This leads us 
to PageRank [27]. 


Derivation 10.29 (PageRank). Consider a collection of web 
pages each with links. We can construct a directed graph G with 
the vertex set V consisting of the web pages and edge set EF consist- 
ing of the directed links among the pages. Imagine a random web 
surfer who will click among these web pages, following links until a 
dead end is reached (a page with no outbound links). In this case, 
the web surfer will type a new URL in (chosen from the set of web 
pages available) and the process will continue. 

From this model, we can induce a Markov chain in which we define 
a new graph G’ with edge set E’ so that if v € V has an out-degree 
of 0, then we create an edge in E’ to every other vertex in V, and 
we then define 


1 


= ———__ 10.18 
de Souter v ( 


plv,v’') 


exactly as before. In the absence of any further insight, the PageRank 
algorithm simply assigns to each web page a score equal to the sta- 
tionary probability of the corresponding state in the induced Markov 
chain. For the remainder of this derivation, let M be the stochastic 
matrix of the induced Markov chain. 

PageRank assumes that surfers will get bored after some number 
of clicks (or new URLs) and will stop (and move to a new page) with 
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some probability d € [0,1] called the damping factor. This factor is 
usually estimated. Assuming that there are n web pages, let r € R™! 
be the PageRank score for each page. Taking boredom into account 
leads to a new expression for rank (similar to Eq. (10.1) for eigen- 
vector centrality): 


i=¢ ” 
r=—— +d SoMa | tert 1 cay (10.19) 


j=l 


Here, the d term acts like a damping factor on walks through the 
Markov chain. In essence, it stalls people as they walk, making it 
less likely that a searcher will keep walking forever. The original 
system of equations in Eq. (10.19) can be written in matrix form as 


r= (=) 1+dM’r, (10.20) 


where 1 isanx1 vector consisting of all 1’s. It is easy to see that when 
d = 1, r is precisely the stationary state probability vector for the 
induced Markov chain. When d ¥ 1, r is usually computed iteratively 
by starting with an initial value of r? = 1/n for alli =1,...,n and 
computing 


r() = (+ = “) 14 dMTrt, 


n 


The reason is that for large n, the analytic solution 
-1fl—-d 
r= (I, —-dM’)* (—) 1 (10.21) 
n 


is not computationally tractable.! 


Example 10.30. Consider the induced Markov chain in Fig. 10.3, 
and suppose we wish to compute PageRank on these vertices with 


Note that (In _ dM’) ‘- computes a matrix inverse. We should note that for 
stochastic matrices, this inverse is guaranteed to exist. For those interested, please 
consult Refs. [97,100,101]. 
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d = 0.85 (which is a common assumption). We might begin with 


Ble 
_——___ 


(0) = 


Ble Ble Ble 


We would then compute 


ed 
2) = (—) teats (P|, 
n ozs 

0.1083 

0.312 

0.260 


We would repeat this again to obtain 


ry?) = (+ = *) 1+ dM? ry) x : 
n 0.260 
160 | 
This would continue until the difference between the values of r‘*) 


and r*-)) was small. The final solution would be close to the exact 
solution: 


0.367 
0.246 
0.246 
0.141 


Note this is (again) very close to the stationary probabilities and the 
eigenvector centralities we observed earlier. This vector is normalized 
so that all the entries sum to 1. 


10.4 The Graph Laplacian 
Remark 10.31. In this last section, we return to simple graphs and 


discuss the Laplacian matrix, which can be used to partition the 
vertices of a graph in a sensible way. 


182 Applied Graph Theory 


Fig. 10.4 A set of triangle graphs. 


Definition 10.32 (Degree matrix). Let G = (V,E) be a simple 
graph with V = {vj,...,Un}. The degree matrix is the diagonal 
matrix D with the degree of each vertex on the diagonal. That is, 
Dj; = deg(v;) and Dj; =O ifi # Ai 


Example 10.33. Consider the graph in Fig. 10.4. It has the degree 
matrix 


ooooo wl 
oOoOOCOON Oo 
oOoOOoOnNnN oo 
oOoON OOO 
SO Wr O.©: © .© 
Nw oO © © © © 


because each of its vertices has a degree of 2. 


Definition 10.34 (Laplacian matrix). Let G = (V, E) be a sim- 
ple graph with V = {v1,...,Un}, adjacency matrix Mm and degree 
matrix D. The Laplacian matrix is the matrix L = D—M. 


Example 10.35. The graph shown in Fig. 10.4 has the adjacency 
matrix 


01100 0 
10100 0 
1100 0 0 
Me |G 1a 
000101 
00011 0 
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Therefore, it has the Laplacian 


Remark 10.36. Note the row sum of each row in the Laplacian 
matrix is zero. The Laplacian matrix is also symmetric. 


Proposition 10.37. The Laplacian matrix L of a simple graph G 
is symmetric. 


Lemma 10.38. The row sum of the adjacency matrix of a simple 
graph is the degree of the corresponding vertex. 


Corollary 10.39. The row sum for each row of the Laplacian matrix 
of a simple graph is zero. 


Theorem 10.40. Let L be the Laplacian matrix of a simple graph 
G. Suppose L € R"*”, then 1 = (1,1,...,1) € R” is an eigenvector 
of L with eigenvalue 0. 


Proof. Let 
diy —@12 —@13 °°: ] 
—a21 do. —ao3 +++ —Aan 
L=| SY, (10.22) 
—Anl —An2 —Gn3 °°° dnn 


Let v = L-1. Recall that L;. is the ith row of L. The ith component 
of v is 


vj = Ly. -1 = [din —aig —a4g «++ — in| 


= di — ai2 — 43 — +++ — Gin = 0. (10.23) 
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We have shown that v; = 0 fori =1,...,n; therefore, v = 0. Thus, 
L-1=0=0-1. 


If follows that 1 is an eigenvector with eigenvalue 0. This completes 
the proof. 


Remark 10.41. It is worth noting that 0 can be an eigenvalue, but 
the zero vector 0 cannot be an eigenvector. 


Remark 10.42. We know from the principal axis theorem (Theo- 
rem A.97) that L must have n linearly independent (and orthogo- 
nal) eigenvectors that form a basis for R” since its a real symmetric 
matrix. 


Theorem 10.43. Let G = (V,E) be a graph with V = {v1,...,Un} 
and with Laplacian L. Then, the (algebraic) multiplicity of the eigen- 
value 0 is equal to the number of components of G. 


Proof. Assume that G has more than one component; order the 


components as Hy,..., H;, and suppose that each component has n; 
vertices. Then, ny + ng +---+nz =n. Each component has its own 
Laplacian matrix L; fori =1,...,k/, and the Laplacian matrix of G 


is the block matrix 
iy Os 0 
i. hesss. 0 


| 


Let 1; be a column vector of 1’s that is the eigenvector of L; 
with eigenvalue 0. We can construct an eigenvector of L as v; = 
(0,...,1;,0,...,0) with eigenvalue 0. This argument holds for all 
L,,...,L,. Thus, L has an eigenvalue of 0 with multiplicity of at 
least k. 

Now, suppose v is an eigenvector with eigenvalue 0. Then, 


i= 


Lv = 0. 


That is, v is in the null space of L. We have so far proved that 
the nullity of L is at least & since each eigenvector v; is linearly 
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independent of any other eigenvector vj; for i # j. Thus, the basis of 
the null space of L contains at least k vectors. On the other hand, it 
is clear by construction that the rank of the Laplacian matrix CL; is 
exactly n; — 1. The structure of £ ensures that the rank of L is 


(ny — 1) + (mg —1) +--+ 4+ (mp -—1) = n-k. 


From the rank—nullity theorem (see Theorem A.66), we know that 
the rank of L plus the nullity of L must be n. Therefore, the nullity 
of L is precisely &. That is, the multiplicity of the eigenvalue 0 is 
precisely the number of components. This completes the proof. 


Remark 10.44. We state the following fact without proof. Its proof 
can be found in Ref. [39, Lemma 13.1.1]. It is a consequence of the 
fact that the Laplacian matrix is positive semi-definite, meaning that 
for any v € R", the (scalar) quantity 


v' Lv > 0. 


Lemma 10.45. Let G be a graph with Laplacian matrix £L. The 
eigenvalues of L are all non-negative. 


Definition 10.46 (Fiedler value/vector). Let G be a simple 
graph with n vertices with Laplacian L. Suppose L has eigen- 
values {A,,...,A1} ordered from largest to smallest (ie., so that 
An > An-1 > ++: > A1). The second smallest eigenvalue Az is called 
the Fiedler value, and its corresponding eigenvector is called the 
Fiedler vector. 


Proposition 10.47. Let G be a graph with Laplacian matrix L. The 
Fiedler value 2 > 0 if and only if G is connected. 


Proof. If G is connected, it has one component; therefore, the mul- 
tiplicity of the 0 eigenvalue is 1. By Lemma 10.45, A2 > 0. On the 
other hand, suppose that Ag > 0, then, necessarily, A; = 0 and has a 
multiplicity of 1. 


Remark 10.48. We state a remarkable fact about the Fiedler vec- 
tor, whose proof can be found in Ref. [102]. 


Theorem 10.49. Let G = (V,E) be a simple graph with V = 
{v1,...,Un} and with Laplacian matriz L. If v is the eigenvector 
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corresponding to the Fiedler value Xo, then the subgraph generated by 
the set of vertices 


Vive) ={4 6 Viv Sec} 


is a connected subgraph of G. 


Remark 10.50. In particular, this means that if c = 0, then the 
vertices whose indices correspond to the positive entries in v allow 
for a natural bipartition of the vertices of G. This bipartition is called 
a spectral clustering of G or, sometimes, a Cheeger cut. This type of 
clustering can be useful for finding groupings of individuals in social 
networks. 


Example 10.51. Consider the social network shown to the left in 
Fig. 10.5. If we compute the Fiedler value for this graph, we see that 
it is Ag = 3—/5 > O since the graph is connected. The corresponding 
Fiedler vector is 


1 (-1- V5) 
ee ne 
$ (V5 —3) | —0.382 
1 if 
Hon ea 
1 


Setting c = 0 and assuming that the vertices are in alphabetical 
order, a natural partition of this social network is 


V, = {Alice, Bob, Cheryl} and 
V2 = {David, Edward, Finn}. 


That is, we have grouped the vertices together with negative entries 
in the Fiedler vector and grouped the vertices together with positive 
entries in the Fiedler vector. This is illustrated in Fig. 10.5. It is 
worth noting that if an entry is 0 (i.e., on the border), that vertex 
can be placed in either partition or placed in a partition of its own. It 
usually bridges two distinct vertex groups together within the graph 
structure. For large graphs, this process can be iteratively repeated 
to produce a spectral clustering. 
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Alice 


Edward 


Cheryl 


Finn 


(a) (b) 


Fig. 10.5 (a) A simple social network and (b) A graph partition using positive 
and negative entries of the Fiedler vector. 


10.5 Chapter Notes 


The use of eigenvectors for ranking far predates its by Google in the 
early 2000s [103]. Dating at least to the 1940s with the work of Seeley 
[104], it is the social scientists who made the greatest use of these 
eigenvector-based ranking mechanisms. In 1953, the sociologist Katz 
[26] developed “Katz centrality,” which bears a remarkable similarity 
to PageRank. Just prior to Katz’s work, Wei [105] had used the 
Perron—Frobenius theorem to put ranking on a firm mathematical 
footing while specifically focusing on sports team rankings. However, 
it was Berge [106] who recognized that this approach could be applied 
to all (directed) graphs. Interestingly, Brin and Page’s publications 
on PageRank [27,107] lack citation of any of this prior work. We 
should note that PageRank was extensively covered in the media 
and the academic literature. See (for example) Refs. [108, 109]. 

Markov chains were first studied (literally invented) by Rus- 
sian mathematician Andrey Markov [1]. Together with Kolmogorov, 
Markov helped establish the foundations of probability theory and 
stochastic processes. Markov chains have been applied widely in engi- 
neering and form the basis of hidden Markov models [110], an early 
approach to statistical machine learning and a model used extensively 
in natural language processing. 

Miroslav Fiedler, after whom the Fiedler vector was named, 
was a Czech mathematician whose work in algebraic graph the- 
ory [102] paved the way for modern spectral clustering meth- 
ods. Spectral clustering has been independently developed in both 
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computer science [111] and network science (physics) [9,112]. How- 
ever, the graph Laplacian has far more applications than just clus- 
tering. The name itself hints at its relation to the Laplacian operator 
V?, which appears in second-order differential equations (such as the 
heat equation). In particular, if G is a connected simple graph with 
Laplacian L, then the graph heat equation is u = —kLu. It has solu- 
tions similar to the continuous heat equation Qu = kV2u but on 
the discrete structure of the graph. This can be useful for solving 
heat equations with complex boundary conditions. See, for exam- 
ple, Ref. [113] for a specific application. The discrete Laplacian also 
emerges in the study of consensus on networks, where it frequently 
drives the dynamics (see, for example, Ref. [114]). 


10.6 Exercises 


Exercise 10.1 

Show that Theorem 10.5 does not hold if there is some other eigen- 
value \ of M so that |Ag| = |A|. To do this, consider the path graph 
with three vertices. Find its adjacency matrix, eigenvalues, and prin- 
cipal eigenvector, and confirm that the theorem does not hold in this 
case. 


Exercise 10.2 
Prove Theorem 10.16. [Hint: Use the same inductive argument from 
the proof of Theorem 9.4.] 


Exercise 10.3 
Prove Lemma 10.22. 


Exercise 10.4 
Draw the Markov chain with stochastic matrix 


0 1 0 
M= 0 0 1}. 
10 0 


Show that lim,_,,, M* does not converge but this Markov chain does 
have a stationary distribution. 
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Exercise 10.5 
Consider the following Markov chain. 


1/2 


Suppose this is the induced Markov chain from four web pages. Com- 
pute the PageRank of these web pages using d = 0.85. 


Exercise 10.6 

Find an expression for r) in terms of r©). Explain how the damping 
factor occurs and how it decreases the chance of taking long walks 
through the induced Markov chain. Can you generalize your expres- 
sion for r) to an expression for r“) in terms of r(°)? 


Exercise 10.7 
Prove Proposition 10.37. 


Exercise 10.8 
Prove Lemma 10.38. 


Exercise 10.9 
Find a spectral bipartition of the following graph. 
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Chapter 11 


A Brief Introduction 
to Linear Programming 


11.1 Introduction and Rationale 


Remark 11.1 (Chapter goals). Many graph-theoretic problems 
can be expressed as linear optimization (sometimes called linear pro- 
gramming) problems. Furthermore, the proofs of some of the most 
fundamental theorems of graph theory are greatly simplified by the 
use of a linear optimization formulation. 

Even though it seems as if we’re going to go far off topic, we 
use this chapter to introduce linear optimization and its fundamen- 
tal results. We then use these results to prove the max-flow/min-cut 
theorem, thereby illustrating the link between the theory of opti- 
mization and the theory of graphs. 


11.2 Linear Programming: Notation 


Definition 11.2 (Linear programming problem). A linear pro- 
gramming problem is an optimization problem of the form 


193 
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max 2(%1,..-,%n) = CX, +--+ + enFn 


S.t. A412, +++ $A1nTy < by 


Ami L1 tes tamntn < bm (11.1) 


hyyty +--+ +hnitn = 11 


hyt1 +--+: + hintn = T1- 


Remark 11.3. We can use matrices to write these problems more 
compactly. Consider the following system of equations: 


4121 + A127 ++++ + Gintn = by 

A121 + A22%2 +++ + AgnFn = bg 
(11.2) 

A@m1£1 + Am2T2 + +++ + AmnFn = bm. 
Then, we can write this in matrix notation as 

Ax =b, (11.3) 
where Aj; = ajj fori = 1,...,m, 7 =1,...,n, and x is a column 
vector in R” with entries 7;, 7 = 1,...,n, and b is a column vec- 
tor in R™ with entries b;, i = 1,...,m. If we replace the equalities 


in Eq. (11.3) with inequalities, we can also express the systems of 
inequalities in the form 


Ax <b. (11.4) 


Using this representation, we can write our general linear pro- 
gramming problem using matrix and vector notation. Equation (11.1) 
becomes 


max z(x) = c!x 
st. Ax <b (11.5) 
Hx =r. 


Here, c” is the transpose of the column vector c. 
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Definition 11.4. In Eq. (11.5), if we restrict some of the decision 
variables (the x;’s) so that they have integer (or discrete) values, then 
the problem becomes a mixed integer linear programming problem. If 
all of the variables are restricted to integer values, the problem is an 
integer programming problem, and if every variable can only take on 
the values 0 or 1, the program is called a 0 — 1 or binary integer pro- 
gramming problem. There are many works on integer programming, 
of which Ref. [115] is one. 


11.3 Intuitive Solutions to Linear Programming 
Problems 


Example 11.5. Consider the problem of a toy company that pro- 
duces toy planes and toy boats. The toy company can sell its planes 
for $10 and its boats for $8 dollars. It costs $3 in raw materials 
to make a plane and $2 in raw materials to make a boat. A plane 
requires 3 hours to make and 1 hour to finish, while a boat requires 
1 hour to make and 2 hours to finish. The toy company knows it will 
not sell anymore than 35 planes per week. Furthermore, given the 
number of workers, the company cannot spend anymore than 160 
hours per week finishing toys and 120 hours per week making toys. 
The company wishes to maximize the profit it makes by choosing 
how much of each toy to produce. 

We can represent the profit maximization problem of the company 
as a linear programming problem. Let x; be the number of planes 
the company will produce, and let x22 be the number of boats the 
company will produce. The profit for each plane is $10 — $3 = $7 per 
plane and the profit for each boat is $8 — $2 = $6 per boat. Thus, 
the total profit the company will make is 


2(41,22) = 7x, + 622. (11.6) 


The company can spend no more than 120 hours per week making 
toys, and since a plane takes 3 hours to make and a boat takes 1 hour 
to make, we have 


321 +22 < 120. (1-7) 


Likewise, the company can spend no more than 160 hours per week 
finishing toys, and since it takes 1 hour to finish a plane and 2 hour 
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to finish a boat, we have 
1 + 2x2 < 160. (11.8) 


Finally, we know that x7; < 35 since the company will make no more 
than 35 planes per week. Thus, the complete linear programming 
problem is given as 
max 2(%1,22) = 7%, + 6x2 
s.t. 3x”, + 22 < 120 

21+ 2x2 < 160 

XY < 395 

XY > 0 


x2 > 0. 


(11.9) 


Remark 11.6. To be precise, the linear programming problem in 
Example 11.5 is not a true linear programming problem because we 
don’t want to manufacture a fractional number of boats or planes 
and; therefore, x; and x2 must really be drawn from the integers and 
not the real numbers (a requirement for a linear programming prob- 
lem). However, we ignore this fact and assume that we can indeed 
manufacture a fractional number of boats and planes. 


Remark 11.7. Linear programs (LPs) with two variables can be 
solved graphically by plotting the feasible region (the values of 
(x1, 2%2)) that make the inequalities true along with the level curves 
of the objective function. We show that we can find a point in the 
feasible region that maximizes the objective function using the level 
curves of the objective function. We illustrate the method first using 
the problem from Example 11.5. 


Example 11.8 (Continuation of Example 11.5). To solve the 
linear programming problem from Example 11.5 graphically, begin by 
drawing the feasible region. That is, plot the inequalities 32; +22 < 
120, x1 + 2% < 160, x, > 35, and 71,22 > 0. This is shown in the 
blue shaded region of Fig. 11.1. 
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Fig. 11.1 Feasible region and level curves of the objective function: The shaded 
region in the plot is the feasible region and represents the intersection of the 
five inequalities constraining the values of 2; and x2. The optimal solution is 
the “last” point in the feasible region that intersects a level set as we move in 
the direction of increasing profit (the gradient of z). 


After plotting the feasible region, the next step is to plot the level 
curves of the objective function. In our problem, the level sets will 
have the form 

721 + 6x2 =€ => Sala od 
6 6 
This is a set of parallel lines with slope —7/6 and intercept c/6, 
where c can be varied as needed. In Fig. 11.1, they are shown in 
colors ranging from purple to red depending upon the value of c. 
Larger values of c are more red. 

To solve the linear programming problem, follow the level sets 

along the direction of the gradient of z = 7x, + 6x2 (shown as 
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the black arrow) until the last level set (line) intersects the feasi- 
ble region. The gradient of 7x1 + 6x2 is the vector (7,6). 

When doing this by hand, draw a single line of the form 7x, + 
6x2 = c and then simply draw parallel lines in the direction of the 
gradient. At some point, these lines will fail to intersect the feasi- 
ble region. The last line to intersect the feasible region will do so 
at a point that maximizes the profit. In this case, the point that 
maximizes z(x%1,%2) = 7x1 + 6x2, subject to the constraints given, 
is (aj, 25) = (16,72). This point is the intersection of the two lines 
30, +22 = 120 and x1 +2x = 160. Note that the point of optimality 
(x3, 25) = (16,72) is at a corner of the feasible region. In this case, 
the constraints 


321 + 22 < 120, 

1 + 2x2 < 160 
are both binding (equal to their respective right-hand sides), while 
the other constraints are nonbinding. In general, we see that when 
an optimal solution to a linear programming problem exists, it will 


always be at the intersection of several binding constraints; that is, 
it will occur at a corner of a higher-dimensional polyhedron. 


Remark 11.9. It can sometimes happen that a linear programming 
problem has an infinite number of alternative optimal solutions. We 
illustrate this in the following example. 


Example 11.10. Suppose the toy maker in Example 11.5 finds that 
it can sell planes for a profit of $18 each instead of $7 each. The new 
linear programming problem becomes 


max 2(%1,22) = 18x, + 6x2 
s.t. 3%, + 22 < 120 
£1 + 2x2 < 160 
ry < 35 
x41 > 0 


x2 > 0. 


(11.10) 


Applying the graphical method for finding optimal solutions to linear 
programming problems yields the plot shown in Fig. 11.2. The level 
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Fig. 11.2 An example of infinitely many alternative optimal solutions in a linear 
programming problem. The level curves for z(x1, x72) = 1821 + 622 are parallel 
to one face of the polygon boundary of the feasible region. Moreover, this side 
contains the points of the greatest value for z(v1, x2) inside the feasible region. 
Any combination of (21,22) on the line 3x1 + x2 = 120 for 21 € [16,35] will 
provide the largest possible value that z(x1,x2) can take in the feasible region S. 


curves for the function z(x1, 22) = 18%; +622 are parallel to one face 
(edge) of the polygonal boundary of the feasible region. Hence, as 
we move further up and to the right in the direction of the gradient 
(corresponding to larger and larger values of z(x1,22)), we see that 
there is not one point on the boundary of the feasible region that 
intersects that level set with the greatest value, but, instead, a side of 
the polygonal boundary described by the line 371 + rg = 120, where 
x1 € [16,35]. Let 


S= {(@1, £2|321 + x2 < 120, 71 + 2% < 160, x1 < 35, 21,22 > O}. 


That is, S is the feasible region of the problem. Then, for any value 
of xj € [16,35] and any value x3 so that 32] + 23} = 120, we have 
z(ai,%5) > z(x%1, £2) for all (x1,22) € S. Since there are infinitely 
many values that 7, and x2 may take on, we see this problem has an 
infinite number of alternative optimal solutions. 
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11.4 Some Basic Facts about Linear Programming 
Problems 


Definition 11.11 (Canonical form). A maximization linear pro- 
gramming problem is in canonical form if it is written as 


max z(x) = c!x 
s.t. Ax <b (11.11) 
x > 0. 


A minimization linear programming problem is in canonical form 
if it is written as 


min 2(x) =c?x 
st. Ax >b (11.12) 
x > 0. 


Definition 11.12 (Standard form). A linear programming prob- 
lem is in standard form if it is written as 


max z(x) = c!x 
s.t. Ax=b (11.13) 
x > 0. 


Remark 11.13. The following theorem is outside the scope of the 
course, but it is useful to know. 


Theorem 11.14. Every linear programming problem in canonical 
form can be put into standard form. 


Remark 11.15. To illustrate Theorem 11.14, we note that it is rel- 
atively easy to convert any inequality constraint into an equality 
constraint. Consider the inequality constraint 


Qjy1L1 + Ajg%q +++ + Aintn < 0;. (11.14) 
We can add a new slack variable s; to this constraint to obtain 
GjpX1 + Aj2%2 + +++ + Ain®n + 5; = Jj. 


Obviously, this slack variable s; > 0. The slack variable then becomes 
just another variable whose value we must discover as we solve the 
linear program for which Eq. (11.14) is a constraint. 
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We can deal with constraints of the form 
Qjy1X1 + Ajot%2 +++ + Aintn > 0; (11.15) 


in a similar way. In this case, we subtract a surplus variable s; to 
obtain 


Qj1X1 + A4QXQ + +++ + AinTn — 84 = Oj. 
Again, we must have s; > 0. 
Example 11.16. Consider the linear programming problem 


max 2(%1,22) = 2x1 — x2 
s.t. 21-22 <1 
2%, +%2>6 
1,09 2 0. 
This linear programming problem can be put into standard form by 
using both a slack and a surplus variable. We obtain 
max 2(%1,22) = 2x1 — x2 
st. 1 —% +5, =1 
2%, + 22 — 59 = 6 
1,02, 51,52 = 0. 
Remark 11.17. We assume, when dealing with linear programming 
problems in standard or canonical form, that the matrix A has full 
row rank, and if not, we adjust it so this is true. The following the- 


orem fully characterizes the solutions to linear programs. The proof 
can be found in Ref. [29]. 


Theorem 11.18. Consider any linear programming problem 


max z(x) = c!x 


P s.t. Ax <b 
x > 0. 
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Then, there are exactly four possibilities: 


) There is a unique solution to problem P denoted by x*. 

(2) There are an infinite number of alternative optimal solutions 
to P. 

(3) There is no solution to P because there is no x that satisfies 
Ax <b andx>0. 

(4) There is no solution to P because the problem is unbounded. That 

is, for any x such that Ax =b, there is another x! # x so that 

Ax’ =b and c’x < c? x’. 


11.5 Solving Linear Programming Problems 
with a Computer 


Remark 11.19. There are a few ways to solve linear programming 
problems. The most common approach is called the simplex algo- 
rithm. The simplex algorithm is outside the scope of this book. 
However, those interested can see Ref. [29]. 


Example 11.20. We illustrate how to solve a linear programming 
problem using Mathematica! because it is particularly easy to do 
so. Suppose I wish to design a diet consisting of Ramen noodles and 
ice cream. I’m interested in spending as little money as possible, but 
I want to ensure that I eat at least 1200 calories per day and that I 
get at least 20 grams of protein per day. Assume that each serving 
of Ramen costs $1 and contains 100 calories and 2 grams of protein. 
Assume that each serving of ice cream costs $1.50 and contains 200 
calories and 3 grams of protein. 

We can construct a linear programming problem out of this sce- 
nario. Let x; be the amount of Ramen I consume and x2 be the 
amount of ice cream I consume. Our objective function is the cost 


ry 4+1.52%9. (11.16) 
The constraints describe our protein requirements as 


2x21 + 3x2 > 20 (11.17) 
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and our calorie requirements (expressed in terms of hundreds of calo- 
ries) 


x, + 2x > 12. (11.18) 
This leads to the following linear programming problem 


min 71+ 1.52%9 


s.t. 2%, + 322 > 20 
(11.19) 
v1 +229 > 12 


£1,X2 > 0. 


In Mathematica’, comments are written as (*Comment*). The 
Mathematica!™ code to solve this problem is shown as follows. 


FindMinimun [ 

{ 

xi + 1.5*x2, (*Objective*) 

2*x1 + 3*x2 >= 20, (*Constraint 1*) 

x1 + 2*x2 >= 12, (*Constraint 2*) 

x1 >= 0, x2 >= 0 (*Non-negativity Constraints*) 
Fy 
{x1, x2} (*Variables*) 
] 


Note that it is relatively easy to interpret this code because 
Mathematica™ is a symbolic language. Other solvers (such as 
Python and MATLAB™) can also be used. The solution returned 
by the solver is 21 = x2 = 4 with a cost of $10. It turns out there are 
an infinite number of alternative optimal solutions to this problem, 
which can be demonstrated through a diagram. 


11.6 Karush—Kuhn—Tucker Conditions 


Remark 11.21. The single most important thing to learn about 
linear programming (or optimization in general) is the Karush— 
Kuhn—Tucker (KKT) theorem giving optimality conditions. These 
conditions provide necessary and sufficient conditions for a point 
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x € R” to be an optimal solution to a linear programming prob- 
lem. We state the Karush-Kuhn—Tucker theorem but do not prove 
it. A proof can be found in Ref. [29]. 


Theorem 11.22. Consider the linear programming problem 


max cx 
P« st. Ax<b (11,20) 


x > 0, 


with A € R™*", b € R™ and (row vector) c € R”. Then, x* € R” 
is an optimal solution to Problem P if and only if there exists (row) 
vectors w* € R™ and v* € R” and a slack variable vector s* © R™ 
so that 


Ax*+s*=b 
primal feasibility (11.21) 
x* >0 


wA-v=c 


dual feasibility w*>0 (11.22) 
v'>0 
w* (Ax* —b) =0 
complementary slackness (11.23) 
va =U; 


Remark 11.23. The vectors w* and v* are sometimes called dual 
variables for reasons that will be clear shortly. They are also 
sometimes called Lagrange multipliers. You may have encountered 
Lagrange multipliers in vector calculus. These are the same kind of 
variables, except applied to linear optimization problems. There is 
one element in the dual variable vector w* for each constraint of the 
form Ax < b and one element in the dual variable vector v* for each 
constraint of the form x > 0. 
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Example 11.24. Consider the toy maker problem (Eq. (11.9)) with 
dual variables (Lagrange multipliers) listed next to their correspond- 
ing constraints: 


max 2(%1,%2) = 7x; + 6x2 Dual Variable 


s.t. 321 + x2 < 120 (w1) 
21+ 2x9 < 160 (wy) 
a < 35 (ws) 
rz, >0 (v1) 
r2 > 0 (v2). 


In this problem, we have 


3.1 
A= |12) b= |160) = [7 6), 
1 0 


2 1) te, 120 
12 | Js 160 
1 o| L%2 35 


3.1 
[wi W2 ws] 1 2 — [vy v2| os |F 6 
1 0 


dual feasibility 
[wi Wa ws] = [0 0 0| 


[or v2] > [0 0} 


3.1 120 
Ly 
[wi we w3]{}1 2 | |- 160} | =0 
complementary slackness 1 o| 2 35 


[vr v2| [z1 x] ='(); 
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Note that we are suppressing the slack variables s in the primal 
feasibility expression for brevity. Recall that at optimality, we had 
x1 = 16 and x2 = 72. The binding constraints in this case were 


34, +22 <120 and 
1 + 2x2 < 160. 
To see this, note that if 3(16)+72 = 120 and 16+2(72) = 160. Then, 
we should be able to express c = [7 6] (the vector of coefficients of 


the objective function) as a positive combination of the gradients of 
the binding constraints 


Vilas 6xa) = [7 6 
V (321 + x2) = [3 1] 
V (21 + 2x2) _ [1 QI. 


This is what dual feasibility asserts to be true. That is, we wish to 
solve the linear equation 


3 1 
i. a) fi 4 =[7 6]. (11.24) 
The result is the system of equations 
3w, + we = 7, 


Ww, + 2wo = 6. 


A solution to this system is wy = 8 and wo = i. This fact is 


illustrated in Fig. 11.3. ° 

Figure 11.3 shows that the gradient lies in the cone formed by 
the gradients of the binding constraints at the optimal point for the 
toy maker problem. This is generally true. At a point of optimality, 
the gradient will lie inside the cone generated by the gradients of the 
binding constraints. Since 71,22 > 0, we must have vy = vg = 0. 
Moreover, since x1; < 35, we know that 71; < 35 is not a binding 
constraint and thus its dual variable w3 is also zero. This leads to 
the conclusion that 


in = 7 [wt wh w§] = (8/5 11/5 0] [vt v3] =[0 Oj, 


> 
a) 


and the KKT conditions are satisfied. 
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Fig. 11.3 The gradient cone: At optimality, the cost vector c is obtuse with 
respect to the directions formed by the binding constraints. It is also contained 
inside the cone of the gradients of the binding constraints. 


11.7 Duality 


Remark 11.25. In this section, we show that to each linear pro- 
gramming problem (the primal problem), we may associate another 
linear programming problem (the dual linear programming problem). 
These two problems are closely related to each other, and an anal- 
ysis of the dual problem can provide deep insight into the primal 
problem. 


Definition 11.26 (Dual program). Consider the linear program- 
ming problem 


max c!x 
P¢ st. Ax<b (11.25) 
x > 0. 
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Then, the dual problem for Problem P is 


min wb 
D< st. wA>c (11.26) 


w> 0. 


Remark 11.27. Let v be a vector of surplus variables. Then, we 
can transform Problem D into standard form as 


min wb 


st. wA-—v=c 
Dg (11.27) 
w>0 


v>0 


Thus, we already see an intimate relationship between duality and 
the KKT conditions. The feasible region of the dual problem (in 
standard form) is precisely the dual feasibility constraints of the KKT 
conditions for the primal problem. 

In this formulation, we see that we have assigned a dual vari- 
able w; (i = 1,...,m) to each constraint in the system of equations 
Ax <b of the primal problem. Likewise, dual variables v can be 
thought of as corresponding to the constraints in x > 0. 


Remark 11.28. The proof of the following lemma can be found in 
Ref. [29]. 


Lemma 11.29. The dual of the dual problem is the primal problem. 


Remark 11.30. Lemma 11.29 shows that the notions of dual and 
primal can be exchanged and that it is simply a matter of perspec- 
tive which problem is the dual problem and which is the primal 
problem. Likewise, by transforming problems into canonical forms, 
we can develop dual problems for any linear programming problem. 

The process of developing these formulations can be exceptionally 
tedious as it requires enumeration of all the possible combinations 
of various linear and variable constraints. The following figure sum- 
marizes the process of converting an arbitrary primal problem into 
its dual. 
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MINIMIZATION PROBLEM VARIABLES CONSTRAINTS 


2 
< > — > 


MAXIMIZATION PROBLEM CONSTRAINTS VARIABLES 


>0 <0 UNRESTRICTED 


< 
0 <0 UNRESTRICTED 


Fig. 11.4 Table of dual conversions: To create a dual problem, assign a dual vari- 
able to each constraint of the form AxUb, where L represents a binary relation. 
Then, use the chart to determine the appropriate sign of the inequality in the 
dual problem, as well as the nature of the dual variables. The signs of the primal 
variables are then used to determine the structure of the dual constraints. 


Example 11.31. Consider the problem of finding the dual problem 
for the toy maker problem (Example 11.5) in standard form. The 
primal problem is 


max 7x, + 6x9 


s.t. 3x1 + 22 + 8; = 120 (w1) 


21 + 2x2 + so = 160 (we) 


1+ 83 = 35 (ws) 


T1,%2,51, 52, $3 = 0. 


Here, we have placed dual variable names (w1, w2, and w3) next to 
the constraints to which they correspond. 

The primal problem variables in this case are all positive. So, 
using Fig. 11.4, we know that the constraints of the dual problem will 
be greater-than-or-equal-to constraints. Likewise, we know that the 
dual variables will be unrestricted in sign since the primal problem 
constraints are all equality constraints. 

The coefficient matrix is 


3 1100 
A=]12010 
10001 
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We also have 
c=[7 6 0 0 0] and 


120 
b= |} 160]. 
35 


Since w = [w, w2 ws], we know that wA will be 
wA = [Bw +wo+w3 wy +2wo wy, we ws]. 


This vector will be related to c in the constraints of the dual problem. 
Remember that in this case, all variables in the primal problem are 
greater-than-or-equal-to zero. Thus, we see that the constraints of 
the dual problem are 


3w, + we + w3 > 7 
wy, + 2we > 6 
wy, > 0 

Ww > 0 

w3 > 0. 


We also have the redundant set of constraints that tells us that w 
is unrestricted because the primal problem had equality constraints. 
This will always happen in cases when you’ve introduced slack vari- 
ables into a problem to put it in standard form. This should be clear 
from the definition of the dual problem for a maximization problem 
in canonical form. 

Thus, the whole dual problem becomes 


min 120w, + 160w2 + 35ws3 
s.t. 3w, +wo+w3>7 
Ww, + 2we > 6 
w, = 0 (11.28) 
w2 > 0 
w3 > 0 


Ww unrestricted. 
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Again, note that, in reality, the constraints we derived from the 
wA > c part of the dual problem make the constraints “w unre- 
stricted” redundant, for in fact w > 0, just as we would expect it to 
be if we’d found the dual of the toy maker problem given in canonical 
form. 


Theorem 11.32 (Strong duality theorem). Consider Problem 
P and Problem D. Then, 


(weak duality) cx* < w*b, thus every feasible solution to the pri- 
mal problem provides a lower bound for the dual and every fea- 
sible solution to the dual problem provides an upper bound to the 
primal problem. 


Furthermore, exactly one of the following statements is true: 


(1) Both Problem P and Problem D possess optimal solutions x* and 
w*, respectively, and cx* = w*b. 

(2) Problem P is unbounded and Problem D is infeasible. 

(3) Problem D is unbounded and Problem P is infeasible. 


3 
(4) Both problems are infeasible. 


Remark 11.33. This final theorem illustrates the true nature of 
duality. Two linear programming problems are dual if they share 
KKT conditions, i.e., if they share conditions for optimality. We con- 
clude that the KKT conditions are fundamental. The optimization 
problems are simply expressions of these systems of equations and 
inequalities. 


Theorem 11.34. Problem D has an optimal solution w* € R™ if 
and only if there exists vectors x* € R” and s* € R™ and a vector of 
surplus variables v* € IR” such that: 


wA>c 
primal feasibility (11.29) 
w'>0O 
Ax*+s*=b 
dual feasibility x 20 (11.30) 


s*>0 
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(w*A —c)x" =0 
complementary slackness (11.31) 
ws =0. 


Furthermore, these KKT conditions are equivalent to the KKT con- 
ditions for the primal problem. 


11.8 Chapter Notes 


The problem of solving a system of linear inequalities was studied 
as early as in the 1800s, and a solution was given by Fourier. This 
method is now called the Fourier-Motzkin method [116] and is not 
usually covered in modern treatments of linear programming (except 
in its historical context). Foundational contributions to linear pro- 
gramming were made by George B. Dantzig, whose solution of two 
open problems in statistics (mistaken for homework while as a stu- 
dent) [117-119] may have formed the basis for part of the script 
of Good Will Hunting. Dantzig developed the simplex algorithm for 
solving arbitrary linear programming problems and is considered the 
father of modern algorithmic optimization. He first proved the dual- 
ity results discussed in this chapter, but it was Von Neumann who 
conjectured them on meeting with Dantzig to discuss linear program- 
ming [120]. Dantzig’s work (at first derided by colleagues for being 
linear in a nonlinear world) was defended by Von Neumann. Inter- 
estingly, linear programming problems are often at the core of many 
problems in optimization [118,119]. His work on flows in networks 
with Fulkerson will be covered in the following chapter. 

Amazingly, though the simplex algorithm is, at the worst case, 
exponential in running time, solving a linear programming problem 
can be accomplished in polynomial time [121]. This relatively recent 
result was improved and made practical by Karmarker’s interior- 
point method [122], which has spawned an entirely new field in com- 
putational optimization. Interestingly, the linear programs that arise 
in the study of flow on graphs are known to be solvable in polynomial 
time (see Chapter 6). Even more amazingly, in 2004, Spielman and 
Teng proved that the average running time for the simplex algorithm 
is, in fact, polynomial [123], helping to explain decades of observa- 
tions on the general efficiency of that algorithm despite it being an 
exponential algorithm in the worst case. 
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11.9 Exercises 


Exercise 11.1 

Show that a minimization linear programming problem in canonical 
form can be rephrased as a maximization linear programming prob- 
lem in canonical form. [Hint: Multiply the objective and constraints 
—1. Define new matrices.] 


Exercise 11.2 
Consider the problem: 
max %1+ 22 
s.t. 27, +29 <4 
%1 + 2% <6 
£1, 22 = 0. 
Write the KKT conditions for an optimal point for this problem. 
(You will have a vector w = [w, we] and a vector v = [v1 v9]). 
Draw the feasible region of the problem, and use MATLAB™ to 
solve the problem. At the point of optimality, identify the binding 


constraints and draw their gradients. Show that the KKT conditions 
hold. (Specifically, find w and v.) 


Exercise 11.3 
Find the KKT conditions for the problem 


min cx 
st. Ax>b (11,32) 
x > 0. 
[Hint: Remember that every minimization problem can be converted 
to a maximization problem by multiplying the objective function by 


—1, and the constraints Ax > b are equivalent to the constraints 
—Ax < —b.] 
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Exercise 11.4 
Identify the dual problem for 
max 171 292 
s.t. 24, +29 >4 


1-7 2x9 < 6 


X1,L2 = 0. 


Exercise 11.5 
Use the table or the definition of duality to determine the dual for 
the problem 


min cx 
s.t. Ax >b (11.33) 
x > 0. 


Chapter 12 


Max Flow/Min Cut with 
Linear Programming 


Remark 12.1 (Chapter goals). The goal of this chapter is to 
discuss the maximum flow problem using a linear programming 
approach. We prove again the max-flow/min-cut theorem using lin- 
ear programming formalisms and show that the duality of edge flows 
and vertex cuts follows from linear programming duality. 


Remark 12.2. In Chapter 6, we proved the max-flow/min-cut the- 
orem using a standard argument without appealing to the fact that 
it is a linear programming problem. In this chapter, we reconsider 
that problem and show how to phrase the maximum flow problem as 
a linear programming problem and also study its dual. 


12.1 The Maximum Flow Problem as 
a Linear Program 


Remark 12.3. Recall first the flow conservation constraint, which 
will now be viewed in the context of a linear programming problem. 


Definition 12.4 (Flow conservation constraint). Let G = 
(V, E) be a digraph with no self-loops and suppose V = {v1,...,Um} 
and F = {e1,...,€n}. Let I(i) be the set of edges with destination 
vertex vu; and O(2) be the set of edges with source v;. Then, the flow 
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conservation constraint associated with vertex v; is 


SO te Stee VG (19.1) 
) 


kEO(i) kel (i 


Here, x; is the flow on edge e, and 0; is the vertex supply (or demand) 
at vertex vj. 


Remark 12.5. Remember that Eq. (12.1) states that the total flow 
out of vertex v; minus the total flow into v; must be equal to the total 
flow produced at v;. Put more simply, excess flow is neither created 
nor destroyed. 


Proposition 12.6. Let G = (V,E) be a digraph with no self-loops 
and suppose V = {v1,...,Un}. Let A be the incidence matrix of 
G (see Definition 9.10). Then, Eq. (12.1) can be written as 


Aj.x = b;, (12.2) 


where x is a vector of variables of the form x, taken in the order the 
edges are represented in A. 


Proof. From Definition 9.10, we know that 
0) if vu; is not in eg, 


Aiz= 41 _ if u; is the source of ex, (12.3) 


—1 if v; is the destination of ex. 


The equivalence between Eq. (12.2) and Eq. (12.1) follows at once 
from this fact. 


Remark 12.7. Recall that the standard basis vector e; € R™*! has 
a 1 at position i and O everywhere else. 


Definition 12.8 (Maximum flow problem). Let G = (V, EF) be 
a digraph with no self-loops and suppose that V = {vj1,...,Um}. 
Without loss of generality, suppose that there is no edge connecting 
Um to vy. The maximum flow problem for G is the linear programming 
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problem 


max f 

s.t. (€m—e1)f +Ax=0 
x= u (12.4) 
x>0O 


f unrestricted. 
Here, u is a vector of edge flow capacity values. 


Remark 12.9. The constraints (e,, — e1) f + Ax = 0 are flow con- 
servation constraints when we assume that there is an (imaginary) 
flow backward from vm to v; along an edge (vm,v1) and that no 
flow is produced in the graph. That is, we assume that all flow is 
circulating within the graph. The value f determines the amount of 
flow that circulates back to vertex v; from v,, under this assump- 
tion. Since all flows are circulating and excess flow is neither created 
nor destroyed, the value of f is then the total flow that flows from 
V1 tO Um. By maximizing f, Eq. (12.4) is exactly computing the max- 
imum amount of flow that can go from vertex v1 to vm, under the 
assumptions that flows are constrained by edge capacities (x < u), 
flows are non-negative (x > 0), and flows are neither created nor 
destroyed in the graph. 


12.2 The Dual of the Flow Maximization Problem 


Theorem 12.10. The dual linear programming problem for 
Eq. (12.4) is 


nm 
min ) uUphr 
k=1 


s.t. Wm — WwW, = 1 
— (12.5) 
uye— we the 0 V ex = (ug, 0;) 2 E 
hy > 0 7,0) eek 


w; unrestricted Wi € {1,...,mb}. 
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Proof. Consider the constraints of Eq. (12.4) and suppose that the 
imaginary edge from vu, to v, is edge eg. We first add slack variables 
to constraints of the form x; < uz to obtain 


tet Sp=up Vee {1,...,n}. 


The constraints (other than x > 0 and f unrestricted) can be rewrit- 
ten in matrix form as 


—1l at aji2 *': Ain 00 .:--- 0 f 
4 oa 0 
0 Qaj2 Qa22 °"': Aan 00 .--- 0 0 
|” 
1 oo 0 0 0 0 
Am1 Am2 Amn | tn| = (12.6) 
0 1 0 0 10 0 UL 
8 
lo 0 1 0 01 o} || |u| 
| | | | 
0 0 0 1 00 | Pa 
Sn 
or more simply as 
a aa 
te el =", (12.7) 
0 I, In . u 


where all elements, written as 0, are zero matrices or vectors of appro- 
priate dimension. This matrix has 2n + 1 columns and m+ n rows. 
To the first m rows, we associate the dual variables w1,...,Wm. To 
the next n rows, we associate the dual variables hy,..., hy». Our dual 
variable vector is then 


y= [Re tse {Wha Mizaa , hn]. 
Written in matrix form, this is 


Constraints Dual variables 


ioe A i tl ol 
xX = 
a a u 


Maz Flow/Min Cut with Linear Programming 219 


Since the constraints in Eq. (12.6) are all equality constraints, we 
know that these dual variables are unrestricted. From Eq. (12.4), we 
know that the objective function vector is 


c=(1,0,...,0]. 


We can now compute our dual constraints. We use Eq. (11.26) and 
Fig. 11.4. The left-hand side of the dual constraints is computed as 


me 


fw? BT] ii - | 
0 I, Jn 


(12.8) 
We fill in the right-hand sides using the vector c, and we use Fig. 11.4 
with the primal variables (f,x). 

Multiplying the matrices in Eq. (12.8) we obtain the first dual 
constraint 


—U1 + Wm = 1. (12.9) 


Since this dual constraint corresponds to the variable f in the primal 
problem, we know it will be an equality constraint (f is unrestricted) 
and that its right-hand side will be 1, the coefficient of f in the 
primal problem. The next n constraints are derived similarly and 
correspond to the variables 7; > 0,...,2% > and so will be the 
inequality constraints 


Wi — Wj + hk > 0. (12.10) 


This follows since there will be a —1 in the matrix whenever edge e, 
has as destination vertex v; and a +1 in the matrix whenever edge 
e, has source at vertex v;. Clearly, there is a 1 in the kth row of the 
identity matrix below the A matrix in Eq. (12.6), thus yielding the 
+h, term. The final n constraints have the form 


hp > 0 (12.11) 


and are derived from the last n columns of the matrix in Eq. (12.6). 
These constraints correspond to the variables s; > 0,...,5, > 0. 
The objective function of the dual problem is computed as 


wf 
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This yields the objective function 


So ugh. (12.12) 
k=1 


Equation (12.5) follows at once. This completes the proof. 


12.3. The Max-Flow/Min-Cut Theorem 


Remark 12.11. Recall from Remark 6.10 that we define a vertex 
cut using two sets V; C V and Vg = V\ Vy with v1 € Vi and um € Vo. 
This cut is referred to as (V,, V2) and consists of all edges connecting 
a vertex in Vj to a vertex in V2. It has a capacity of C(Vi, V2), given 
in Definition 6.11. 


Remark 12.12. Unlike in Chapter 6, we use linear programming 
duality to obtain the following results. 


Lemma 12.13. Let G = (V, E) be a directed graph, and suppose V = 
{v1,.-.,Um} and E = {e1,...,en}. The solution to the maximum 
flow problem is bounded above by the minimal cut capacity. 


Proof. Let (Vi, V2) be the cut with minimal capacity. Consider the 
following solution to the dual problem: 


0 vyEV 
(= (12.13) 
1 uE€VW 
and 
1 e,=(u;,v;) and vuj7eV, and vu; € Va, 
t= jc = ee Gan 
0 otherwise. 


It is clear that this represents a feasible solution to the dual problem. 
Thus, by the strong duality theorem (Theorem 11.32), the objective 
function value 


So ugh}. (12.15) 
k 


is an upper bound for the primal problem. But this is just the 
capacity of the cut with the smallest capacity. This completes the 
proof. 
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Lemma 12.14. In any optimal solution to Eq. (12.4), every directed 
path from v1 to Vm must have at least one edge at capacity. 


Proof. Note first that Eq. (12.4) is bounded above by the capacity 
of the minimal cut, as shown in Lemma 12.13, and since the zero 
flow is a feasible solution, we know from Theorem 11.18 that there is 
at least one optimal solution to Eq. (12.4) because the problem can 
neither be unbounded nor infeasible. 

Consider any optimal solution to Eq. (12.5). Then, it corresponds 
to some optimal solution to the primal problem, and these solutions 
satisfy the Karush—Kuhn—Tucker conditions. We show that in this 
primal solution, along each path from v; to vm in G, at least one 
edge must have flow equal to its capacity. To see this, note that for 
any edge that does not carry its capacity (that is, x, < uz) we must 
have hz = 0 (to ensure complementary slackness). Suppose this path 
has vertices (u1,U2,.-.,Us) With vj = u, and vp, = Us. If there is 
some path from v, to vm that does not carry its capacity, then we 
have the following requirements: 


Ws > U1, 


wi = We, 


Ws—1 2 Ws. 


However, this implies that w, > wy > wo > --: > ws, which is a 
contradiction. Therefore, every path from v1 to vm has at least one 
edge at capacity. 


Remark 12.15. The proofs of the next two results are identical to 
the ones in Chapter 6. See Theorem 6.15 and its corollary. 


Theorem 12.16. Let G = (V,E) be a directed graph and suppose 
V = {v,...,Um} and E = {e),...,en}. There is at least one cut 
(Vi, V2) so that the flow from v1 to Um is equal to the capacity of the 
cut (Vi, V2). 


Corollary 12.17 (Max-flow/Min-cut theorem). Let G=(V, EF) 
be a directed graph, and suppose V = {vj,...,Um} and E = 
{e1,...,@n}. Then, the maximum flow from v1 to Um is equal to the 
capacity of the minimum cut separating v1, from Um. 
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Remark 12.18. The derivation of the Ford—Fulkerson algorithm 
for finding a maximum flow and its proof of correctness do not 
require the linear programming formulation and thus can be found in 
Chapter 6. In the remainder of this chapter, we briefly discuss the 
min-cost flow problem and its computational complexity and the rela- 
tionship between the primal and dual problems in K6nig’s theorem. 


12.4 Min-Cost Flow and Other Problems 


Definition 12.19 (Min-cost flow problem). Let A be the inci- 
dence matrix of a directed graph. Assume that flow x, has cost cz € R 
and that the flow on edge k is constrained so that ly < ©, < uz, where 
ly > 0 is a lower bound on the flow. If b; € R is the flow produced 
(or consumed) at vertex i, then the minimum cost flow problem is 


T 


min cx 
s.t. Ax=b (12.16) 
l<x<u, 


where c, b, 1, and u are vectors of the corresponding parameters. 
Then, Eq. (12.16) is a minimum-cost flow problem. 


Remark 12.20. Solving problems of this type are outside the scope 
of this text, but they are discussed in Ref. [29], with the network 
simplex algorithm being one very efficient mechanism of solution. 
The following proposition follows from the fact that linear programs 
can be solved in polynomial time [121]. 


Proposition 12.21. The minimum-cost flow problem can be solved 
in polynomial time. 


Remark 12.22. Interestingly, these problems can be solved in poly- 
nomial time even if we require the solution to be in integers. This 
is not true of general linear programming problems and is what is 
known as being strongly polynomial [124]. We have already seen an 
example of this in the integer flow theorem (Corollary 6.31). More 
details on why this is the case can be found in Ref. [29]. As a result 
of this property, these minimum-cost flow problems find a variety of 
uses, which we illustrate in the following. 
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Example 12.23 (Assignment problems). Consider a group of m 
people who are to be assigned to n projects. Create a vertex for each 
person and project and an extra “dummy” project. Create a com- 
plete directed bipartite graph from people to projects. (We assume 
all people can work on all projects. If this is not the case, simply 
remove those edges.) To each person vertex, connect a source vertex, 
and from each project, connect a sink vertex. This is illustrated in 
Fig. 12.1. Flow in this graph will be hours assigned from a person 
to a project. Table 12.1 shows the parameters assigned to each edge. 
The cost per hour can be specific to an individual and project so that 
each person may cost a different amount on each project. The upper 
bounds on the edges from the sink to the people prevent individuals 
from being overscheduled. The lower bounds on the edges from the 
projects to the sink ensure each project is adequately staffed. The 


Cost = Cost Per Hour on Project 
Lower Bound = 0 
Cost =0 Upper Bound = Infinity 


Lower Bound = 0 Cost = 0 
Upper Bound = Max Hours Available Lower Bound = Min Hours Needed 
Projects Upper Bound = Infinity 


“Hours Source” “Hours Sink” 


Dummy 


Costs of edges to dummy vertex = 0 
People 


Fig. 12.1. A minimum cost flow problem that will solve a project assignment 
problem. 


Table 12.1 A table showing the edge parameters for a simple assignment 
problem. 


Edge Type Lower Bound Upper Bound Cost 
Source to person 0 Max. hours available 0 
Person to project 0 fore) Cost per hour 
Person to dummy 0 oo 0 
Project to sink Min. hours needed oo 0 
Dummy to sink 0 oo 0 
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Table 12.2 A table showing the vertex flow production. 


Vertex Type Flow Production (Consumption) 
Source Total person-hours available 
Person or project 0 

Sink (Total person-hours available) 


dummy collects excess hours that will not be used to schedule people 
to projects. The vertex flow production is given in Table 12.2. The 
source and sink generate (consume) flow, so they will have positive 
(negative) flow production values. All other vertices neither consume 
nor generate flow. Solving the minimum-cost flow problem will yield 
an assignment (in hours) of people to projects. If the problem is 
infeasible, it means that hour quotas cannot be met; i.e., projects 
demand too many hours. If the problem is feasible, then this is the 
cheapest possible assignment of people to projects. 


Remark 12.24. Many assignment problems can be rephrased in this 
format or have (at their core) a flow problem that can be exploited 
to find solutions quickly. The interested reader can consult Ref. [29]. 


Example 12.25. Consider a simple assignment problem with three 
people and two projects. Both projects require 20 hours of work. We 
will call them Project 1 and Project 2: 


(1) Alice costs $20 per hour and is only qualified to work on 
Project 1. She can work a maximum of 20 hours. 

(2) Bob costs $15 per hour and can work on either project. He can 
work up to 30 hours. 

(3) Charlie costs $10 per hour and can work only on Project 1. He 
can work up to 10 hours. 


The problem is illustrated in the network flow diagram in 
Fig. 12.2. Flow will leave Vertex 1 and go to Vertices 7 (the dummy 
vertex) and 8. The total number of hours available is 60 hours. This 
is the flow generated at Vertex 1. The projects absorb 40 hours 
(b} = 40). This is the flow absorbed at Vertex 8 (bg = —40). The 
dummy vertex (Vertex 7) absorbs the remaining 20 hours (b7 = —20). 
Let x;; => O be the amount of flow from Vertex 7 to Vertex j. 
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Project 1 


25 


20 hours (min) 


Project 2 


30 hours -(6) (8) 
aun) < 6 8 >40 hours 
(mex) a5 \ 20 hours (min) 


10 hours 
(max) 


60 hours 


$10 


4 (7) 20 hours 


Charlie Dummy 


Fig. 12.2 An assignment problem with three workers and two projects. Con- 
straints and costs are notionally illustrated on the edges. 


The resulting linear programming problem is given by the expression 


min 10x45 + 15x35 + 15x36 + 20x25 
$.t. 212 +213 + 214 = 60 

£12 — Lo5 — Lo7 = 0 

£13 — £35 — £36 — £37 = O 

L14 — L45 — La7 =O 

©5 + £35 + L45 — Log = 0 

£36 — Leg = O 

X27 + £37 + La7 = 20 

x53 + 26g = 40 

212 < 20 

213 < 30 

r14 < 10 

X58 = 20 

xeg = 20 

nig 0 Vi, J. 
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Project 1 


(2) 


lice \.20 10 


Al 
Project 2 
60 hours 1 >| >40 hours 
Bob 
10 
(7 20 hours 
Charlie Dummy 


Fig. 12.3. The solution to assignment problem has Alice (the most expensive) 
not assigned to a project, while Bob and Charlie work on Projects 1 and 2. 


Note that the incidence matrix of the graph in Fig. 12.2 is hidden in 
the flow conservation constraints of this problem. 

Using a computer, we can conclude that the optimal solution has 
Bob assigned to Project 1 for 10 hours and Project 2 for 20 hours. 
Charlie is assigned to Project 1 for 10 hours. Alice (the most expen- 
sive worker) is not assigned hours. All her hours flow to the dummy 
vertex. This is illustrated in Fig. 12.3. 


12.5 The Problem of Generalizing K6nig’s Theorem 
and Duality 


Remark 12.26. In Chapter 6, we noted that Konig’s theorem does 
not hold in general. To see this, consider K’3 (see Fig. 12.4). In this 
case, the general inequality that that the cardinality of the maximal 
matching is at most the cardinality of the minimal covering does hold 
(and this will always hold), but we do not have equality. 


Remark 12.27. Let G = (V,£) be a (bipartite) graph with V = 
{v1,..-;Um} and E = {e1,...,éen}. The minimal vertex covering 
problem for G can be written as the integer programming problem 
min %1 +---+2m 
St. Ui +25 >1 V{ui, v5} ECE (12.17) 
LE {0,1} VG — Leng TMs 
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vy YY 


Minimal Covering Maximal Matching 
Cardinality = 2 Cardinality = 1 


Fig. 12.4 In general, the cardinality of a maximal matching is not the same 
as the cardinality of a minimal vertex covering, though the inequality that the 
cardinality of the maximal matching is at most the cardinality of the minimal 
covering does hold. 


Here, x; acts as a boolean (true/false) variable that determines 
whether v; is in the cover or not. If A is the incidence matrix for 
G, then this problem can be written in matrix notation as 


min 17x 
st. ATx>1 (12.18) 
x € {0,1}™, 


where 1 is a vector consisting of only ones of appropriate length. 
For simplicity, consider the relaxation of the integer program to 
a linear program: 


min 17x 
st. ATx>1 (12.19) 
O<x< 1. 
We can rewrite this as 


min 1’x Dual Variables 


x > 0. 


The dual variables are shown to the right of the problem and I7 =I 
because it’s an identity matrix. The dual problem can be read from 
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Fig. 11.4: 

max 17w-—17u 

st. Aw-—Iu<l1 
w,u> 0. 

Remember that we are dealing with a dual problem generated from a 
relaxation of an integer programming problem of interest. To recover 
a sensible integer program from the dual, we note that setting u = O 
will make the objective function larger. If we make that assumption, 


then we know that w < 1 because of the structure of the constraints, 
and we can write 


max 1’ w 
s.t. Aw <1 
O<w<l. 


Transforming this back to an integer program yields 


max 1! w 
st. Aw <1 
w € {0,1}”, 


which is the maximum matching problem. Because we know that 
relaxations will always have optimal objective functions that are at 
least as good as their corresponding integer programming problems, 
we have proved that 


TTP aatich < LP miateh < LP cover < TP cover: (12.20) 


The middle inequality follows from weak duality and our assumption 
on the variables u. However, we have seen for the special case of 
bipartite graphs that these inequalities must be equalities. This is 
the K6nig’s theorem. 


12.6 Chapter Notes 


The interplay between graph structures and linear and integer pro- 
gramming has led to interesting theoretical and practical results. 
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The minimum-cost flow problem can be solved in a number of dif- 
ferent ways, including the network simplex method, the out-of-kilter 
algorithm [125], and the push-relabel algorithm [126]. There is no 
single original network simplex paper, though the idea can be traced 
as far back as Koopmans [127], with major work on the polyno- 
mial method by Orlin, Plotkin, and Tardos [128], Orlin [129], and 
Trajan [130]. The out-of-kilter algorithm was developed by Fulkerson 
[125] (the same Fulkerson of the Ford—Fulkerson algorithm) and is 
unique in that it obtains feasible solutions to both the primal and 
dual linear programming problems underlying the minimum-cost flow 
problem, but these solutions do not satisfy complementary slackness. 
The algorithm proceeds to correct this problem, ultimately arriving 
at a point satisfying the KKT conditions. See Ref. [29] for a com- 
plete description. The push-relabel algorithm is the current fastest 
solution method for maximum flow problems with a running time 
of O(|V|?|E|) compared to Theorem 6.29, which gives a running 
time of O(|V||E|?) for the Edmonds—Karp algorithm. Interestingly, 
Dijkstra’s algorithm can even be derived from the dual problem of an 
appropriately phrased linear programming problem. An alternative 
proof of that algorithm’s correctness uses the KKT conditions, just 
as we did for the maximum flow problem in this chapter. It is worth 
noting that many of these citations occur from 1988 onward, showing 
that this is a recent area of active research. 

The general matching problem (and its generalization, the 
weighted matching problem) on arbitrary graphs is also a much stud- 
ied area of combinatorial optimization. The blossom algorithm devel- 
oped by Edmonds [131] (of the Edmonds—Karp algorithm) is known 
to solve this problem in polynomial time O(|V||E|?). This is a sec- 
ond example of a combinatorial optimization problem that can be 
phrased as an integer programming problem that can be solved in 
polynomial time. (Integer programming problems, in general, cannot 
be solved in polynomial time unless P = NP, which is still open.) The 
fact that this is the case is important because it arises from a different 
mechanism than the polynomial time complexity of the network flow 
problem. As a result, the so-called matching polytope [132], describ- 
ing the feasible region of the matching problem, has been much stud- 
ied [133] because of this polynomial time property. Works dedicated 
to integer and linear programming [29, 59, 115, 134] will have addi- 
tional information on the relationship between graphs and networks 
and linear programming. 
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There are many other applications of linear programming (and 
integer programming) to the study of graphs. Please consult Refs. 
[29, 59, 115, 134]. 


12.7 Exercises 


Exercise 12.1 

Show that the maximum flow problem can be rephrased as a 
minimum-cost flow problem. Consequently, prove that it is a spe- 
cial case of a minimum-cost flow problem. 


Exercise 12.2 

For the graph Ks, find the integer programming problems and 
its relaxation for the minimum covering and maximum matching 
problems. Find optimal solutions to each problem, and illustrate 
Eq. (12.20). 


Exercise 12.3 

Remove one edge from K3 to form a bipartite graph and repeat 
Question 12.2. Illustrate the Konig’s theorem in this case with the 
integer/linear programs. 


Appendix A 


Fields, Vector Spaces, and Matrices 


Remark A.1. This appendix introduces the essentials of linear alge- 
bra needed for the results on algebraic graph theory presented in 
the main part of the book. Proofs are omitted for brevity. Reading 
Chapter 8 will be helpful for understanding the definition of group, 
which is used throughout this appendix. The Perron—Frobenius the- 
orem, a useful result in linear algebra, is not presented in this chap- 
ter. Instead, it is presented where it is needed in the main body of 
the text. 


A.1 Matrices and Row and Column Vectors 


Definition A.2 (Field). A field (or number field) is a tuple 
(S,+,-,0,1) where: 


(1) ($,+) is a commutative group with unit 0; 

(2) (S'\ {0},-) is a commutative group with unit 1; 

(3) the operation - distributes over the operation + so that if a1, ae, 
and ag are elements of F’, then a1 - (a2 + a3) = a1: a2 + a1 + a3. 


Example A.3. The archetypal example of a field is the field of real 
numbers R, with addition and multiplication playing the expected 
roles. Another common field is the field of complex numbers C (num- 
bers of the form a+ bi with i = /—1 the imaginary unit), with their 
addition and multiplication rules defined as expected. 
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Definition A.4 (Matrix). An m xn matrix is a rectangular array 
of values (scalars) drawn from a field. If F is the field, we write F”*” 
to denote the set of m x n matrices with entries drawn from F. 


Example A.5. Here is an example of a 2 x 3 matrix drawn from 


R2x3. 
ro 3 1 ¢ 
2 /2 5 


Remark A.6. We denote the element at position (7,7) of matrix A 
as A;;. Thus, in the example above, Ajj = 2. 


Definition A.7 (Matrix addition). If A and B are both in F”*”, 
then C = A+B is the matrix sum of A and B in F’”’*” and 


Cy; = Ay + Bi; fori=1,...,mandj=1,...,n. (A.1) 
Here, + is the field operation for addition. 
Example A.8. 
1 2 5 6] |1+5 24+6] |6 8 
k \+(F Ee colle rk el 
Definition A.9 (Scalar-matrix multiplication). If A is a 


matrix from F”™*" and c € F, then B = cA = Ac is the scalar- 
matrix product of c and A in F”*” and 


Be jtA, fort—lyugm and: 9H 1,15 m, (A.3) 
Example A.10. Let 


p-(? 3. 


When we multiply by the scalar 3 € R, we obtain 
3.[3 7/9 2 
6 3) |18 9]° 
Definition A.11 (Row/Column vector). A 1 x n matrix is 
called a row vector, and am X 1 matrix is called a column vector. 


Every vector will be thought of as a column vector unless otherwise 
noted. A column vector x in R"™! (or R”) is: x = (21,.--,2n). 
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Remark A.12. It should be clear that any row of matrix A could 
be considered a row vector and any column of A could be considered 
a column vector. The ith row of A is denoted as A;., while the jth 
column is denoted as A.;. Also, any row/column vector is nothing 
more sophisticated than tuples of numbers (a point in space). You 
are free to think of these things however you like. 


Definition A.13 (n-dimensional vector space F”). Let F bea 
field (e.g., the real numbers). The set of all vectors x = (%1,...,2n) 
with x; € F is the vector space F". The dimension of F” is n. 


Proposition A.14. The vector space F” is a group under vector 
addition, with the zero vector 0 = (0,0,...,0) being the additive 
identity. Furthermore, F” is closed under scalar multiplication. 


Remark A.15. The vector space F” is usually the first one students 
learn. In particular, we speak of R” in most classes on matrices. In 
linear algebra, we learn that more abstract things can be made into 
vector spaces, all of which are isomorphic to some vector space F”. 
For the purpose of this book, we confine ourselves to these archetypal 
vector spaces. 


A.2 Matrix Multiplication 
Definition A.16 (Dot product). If x,y ¢ F” are two 
n-dimensional vectors, then their dot product is 


x-y= > Divas (A.4) 
i=1 


where x; is the ith component of the vector x. 


Remark A.17. The dot product is an example of a more gen- 
eral concept called an inner product, which maps two vectors to a 
scalar. Not all inner products behave according to Eq. (A.4), and 
the definition of the inner product will be vector-space- and context- 
dependent. 
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Definition A.18 (Matrix multiplication). If A ¢ R™”*” and 
BeR"*?, then C = AB is the matrix product of A and B and 


Ci; — Aj. ? B.;. (A.5) 


Note that A;. € R!*” (an n-dimensional vector) and B.; € R”*! 
(another n-dimensional vector), thus making the dot product mean- 
ingful. Note also that C € R™*?. 


Example A.19. 
1 2) /5 6] _ f16)+2(7) 16) +2(8)] _J19 22] 14 6) 
3 4/|/7 8] |3(5)+4(7) 3(6)+4(8)| | 43 50)" : 
Remark A.20. Note that we cannot multiply any pair of arbitrary 
matrices. If we have the product AB for two matrices A and B, then 


the number of columns in A must be equal to the number of rows 
in B. 


Definition A.21 (Matrix transpose). If A ¢ F™*" isamxn 
matrix, then the transpose of A, denoted by A’, is an m x n matrix 
defined as 


Al, = Aji. (A.7) 


Example A.22. 


PT 
1 2 is) 
Ey-b as 
Essentially, we are just reading down the columns of A to obtain its 
transpose. 


Remark A.23. The matrix transpose is a particularly useful opera- 
tion and makes it easy to transform column vectors into row vectors, 
which enables multiplication. For example, suppose x is an n x 1 
column vector (i.e., x is a vector in F”), and suppose y is an n x 1 
column vector. Then, 


yas yy (A.9) 
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A.3 Special Matrices 


Definition A.24 (Square matrix). Any matrix A € F”*” for a 
field F is called a square matriz. 


Remark A.25. There are many special (and useful) square matri- 
ces, as we discuss in the following. 


Definition A.26 (Identity matrix). The n x n identify matrix is 


=. oe (A.10) 


oo 


Here, 1 is the multiplicative unit in the field F from which the matrix 
entries are drawn. 


Definition A.27 (Zero matrix). The nxn zero matrix isanxn 
consisting entirely of 0’s (the zero in the field). 


Definition A.28 (Symmetric matrix). Let M ¢€ F"*" be a 
matrix. The matrix M is symmetric if M = M7”. 


Example A.29. Suppose that 


ee bw 
NIF w 


This matrix is symmetric. 


Definition A.30 (Diagonal matrix). A diagonal matrix is a 
(square) matrix with the property that D;; = 0 for i # jm and 
D,; may take any value in the field on which D is defined. 


Example A.31. Consider the matrix 
2 0 0 
D= 0 4 OQ}. 
0 0 6 


This is a diagonal matrix. 
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Remark A.32. A diagonal matrix has (usually) nonzero entries only 
on its main diagonal. Of course some of its diagonal entries may be 
Zero. 


A.4 Matrix Inverse 


Definition A.33 (Invertible matrix). Let A € F”*” be a square 
matrix. If there is a matrix A~! such that 


AA-+=A"A =I, (A.11) 


then matrix A is said to be invertible (or nonsingular) and AW! is 
called its inverse. If A is not invertible, it is called a singular matrix. 


Definition A.34 (Matrix power). If A ¢€ F"*", then A* = 
A*-!A4 = AA*~! for a positive integer k. We define A° = I, for 
any nonsingular square matrix A. 


Theorem A.35. If A € F”*” is invertible, then A~! is unique. 


Remark A.36. Definition A.33 and Theorem A.35 show that the 
inverse of a square matrix is unique if it exists and there is no differ- 
ence between a left inverse and a right inverse. 


Remark A.37. The set of n x n invertible matrices over R is 
denoted by GL(n,R). It forms a group called the general linear group 
under matrix multiplication, with I, as the unit. The general linear 
group over a field F is defined analogously. 


Proposition A.38. Jf both A and B are invertible in F"*”", then 
AB is invertible and (AB)~! = B-!A*!. 


A.5 Linear Combinations, Span, and Linear 


Independence 
Definition A.39. Let vi,...,Vm be (column) vectors in F", and let 
Q1,...,@m € F be scalars. Then, 


Q{V1 ++++ + AmVm (A.12) 


is a linear combination of the vectors v1,...,Vm- 
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Definition A.40 (Span). Suppose W = {vi,...,Vm} is a set of 
vectors in F”, then the span of W is the set 


span(W) = {y € F” : y is a linear combination of vectors in W}. 


(A.13) 

Definition A.41 (Linear independence). Let vj,..., Vm be vec- 

tors in F”. The vectors vj,...,Vm are linearly dependent if there 
exists Q1,...,Q@m € F, not all zero, such that 

Qa,Vy +--+ AmnVm = 0. (A.14) 

If the set of vectors vj1,...,Vm is not linearly dependent, then they 

are linearly independent and Eq. (A.14) holds just in case a; = 0 for 

all i = 1,...,n. Here, 0 is the zero vector in F” and 0 is the zero 


element in the field. 


Example A.42. In R®, consider the vectors 
il 1 0 
vi= {1}, ve=]0}, and v3=]1}. 
0 1 1 
We can show that these vectors are linearly independent: Suppose 


there are values a1, Q@2,a3 € R such that 


Q1 V1 + QeV2 + a3zv3 = 0. 


Then, 
ay ag] | 0 Oy Ay 0 
ay| + | 0 a3} = |a,;+a3] = |0]. 
0 a2 a3 Oo = ks 0 


Thus, we have the system of linear equations 


ay tag =0 
a; +a3 =0 
ag + a3 =0. 
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From the third equation, we see that a3 = —ag. Substituting this 
into the second equation, we obtain two equations: 


ay +ag=0 
ay — ag = 0. 


This implies that a, = ag and 2a; = 0 or a, = a2 = 0. Therefore, 
a3 = 0, and thus, these vectors are linearly independent. 


Remark A.43. It is worth noting that any set of vectors becomes 
linearly dependent if the zero vector O is added to it. 


Example A.44. Consider the vectors 
1 4 
vi = 12 and vo= ]95|. 
3 6 


Determining linear independence requires us to find solutions to the 
equation 


1 4 0 
a, }2}| tag }5} = |0 
3 6 0 


or the system of equations 


a; + 4ag = 0, 
2a, + Sag = 0, 
3a , + bag = 0. 


Thus, a; = —4a9. Substituting this into the second and third equa- 
tions yields 


—3a2 = 0, 
—6a2 =), 


Thus, a2 = 0 and, consequently, aj = 0. Thus, the vectors are lin- 
early independent. 
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Example A.45. Consider the vectors 


nef of} = ofl 


As before, we can derive the system of equations 


QA 302 03 = 0, 


2a} 4a9 6a3 = 0. 


We have more unknowns than equations, so we suspect there may be 
many solutions to this system of equations. From the first equation, 
we see that a, = —3a2 — 5a3. Consequently, we can substitute this 
into the second equation to obtain 


—6az — 10a3 + 4a2 + 6a3 = —2a2 — 4a3 = 0. 


Thus, ag = —2a3 and a, = 6a3 — 5a3 = a3, which we obtain by 
substituting the expression for a2 into the expression for a ,. We 
conclude that a3 can be anything we like; it is a free variable. Set 
a3 = 1. Then, ag = —2 and a, = 1. We can now confirm that this set 
of values creates a linear combination of v1, v2, and v3 equal to 0. 
We compute 


tbl? [alfa = bose] = [ 


Thus, the vectors are not linearly independent and they must be 
linearly dependent. 


A.6 Basis 


Definition A.46 (Basis). Let B = {v1,...,Vm} bea set of vectors 
in F”. The set B is called a basis of F” if B is a linearly independent 
set of vectors and every vector in F” is in the span of B. That is, for 
any vector w € F”, we can find scalar values aj,...,Q@m € F such 
that 


w= > Ovi. (A.15) 
i=1 
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Example A.47. We can show that the vectors 


le) 


1 1 
vi= {1}, veo= ]0}, and v3= 
0 1 


form a basis of R°. We already know that the vectors are linearly 
independent. To show that every vector in R® is in its span, chose 
an arbitrary vector in R*: (a,b,c). Then, we hope to find coefficients 
Q 1, a2, and a3 so that 


a 
Q1V1, + QgVv2o + a3Vv3 = |b]. 


Cc 


Expanding this, we must find a1, a2, and a3 so that 


QA a2 0 a 
a,| +] 0] + Jag} = |b]. 
0 ag as c 


A little effort (in terms of algebra) will show that 
a1 = -=(a+b-c), 
(a—b+c), (A.16) 


ag = 


ag = ~(-a+b+c). 


le wmleE rele 


Thus, the set {v1, v2, v3} is a basis for R°. 


Remark A.48. Note that there are three vectors in this basis for R°. 
In general, a basis for F” will have exactly n vectors. 


Definition A.49 (Standard basis). The standard basis of F” 
consists of n columns (or rows) of the identity matrix I,,. In most 
(but not all) texts, they are written as 


e, = (1,0,...,0) e2=(0,1,0,...,0) ++» e, =(0,0,...,1). 
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A.7 Orthogonality in R” 


Remark A.50. When two vectors are perpendicular to each other 
in some vector space, they are said to be orthogonal. We only need 
orthogonality in R”, so we formally define the idea for vectors with 
real entries. The definitions are similar but slightly different for vec- 
tors in, for example, C”. In particular, the dot product used in the 
following definition is replaced with a suitable inner product. 


Definition A.51 (Orthogonality). Two vectors x and y in R” are 
orthogonal if x -y = 0. 


Example A.52. It is easy to see that the vectors (1,0) and (0, 1) 
are orthogonal. 


Definition A.53 (Norm). The norm of a vector x € R” is given 
by 


|x|] = Veo. 


Definition A.54 (Unit vector). A vector x € R” is a unit vector 
if ||x|| = 1. 


Remark A.55. If we can define a vector norm in an arbitrary vector 
space F”, then any vector with norm 1 is called a unit vector. Even 
though we only need these ideas for vectors with real entries, it is 
nice to know that all of these definitions can be suitably generalized. 


Definition A.56 (Orthogonal/Orthonormal basis). A basis 6 
is orthogonal if all the vectors in it are mutually orthogonal to each 
other. It is orthonormal if all the vectors are mutually orthogonal 
and they are all unit vectors. 


Remark A.57. It is easy to see that the standard basis is always 
orthonormal in R”. We will see another example of an orthonormal 
basis when we introduce eigenvalues in a later section. 


A.8 Row Space and Null Space 


Definition A.58 (Row space). Let A ¢ F”*”. The row space of 
A is the vector space F* spanned by the rows of A. 
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Example A.59. We have already shown in Example A.47 that the 
row space of the matrix 


eH OF 
rer © 


is the space R®. 


Example A.60. Consider the matrix 
1 2 

A=]3 4]. 
5 6 


The rows come from R?. Consequently, we know that the row space 
is at most R? (though it could be R!). By Remark A.48, we know 
that the row vectors must be linearly dependent (they are in R? and 
there are three of them). Therefore, we show that two of the vectors 
are linearly independent, and consequently, they must span R?. To 
see this, we solve 


ay [1 2]+a[3 4] =[0 0]. 
Then, we have the equation system 
ay + 3a2q = 0, 
2a; + 4ayg = 0. 
Multiplying the first equation by —2 and adding, we obtain 
—2a2q = 0. 


Therefore, a2 = 0, and it follows that a; = 0. These two vectors are 
linearly independent. A similar argument shows that they span R?. 
Therefore, the row space of A is R?. 


Definition A.61 (Rank). The rank of matrix A € R™*” is the 
dimension of the row space or, equivalently, the number of linearly 
independent rows of A. 


Definition A.62 (Null space). The null space of a matrix A € 
F*" is the set of vectors N(A) such that ifx € (A), then Ax = 0. 
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Proposition A.63. Jf the columns of A are linearly independent, 
then the null space of A consists of only the zero vector (and has a 
dimension of zero). 


Example A.64. Consider the matrix 


i.23 
A=[ ial 


It should be clear that the rank of this matrix is 1 because row 2 is 
a multiple of row 1. We can construct the null space by solving 


ZY 
io 3 _ fo 
2 4 6| |*2| ~ Jol’ 
r3 


This leads to two equations: 


r+ 2%9+ 3x3 = 0, 
2x; + 4% + 3x3 = 0, 


which are really just one equation. Then, setting 
r+ 2% + 3x3 = 0 
implies that 
Ly, = —2%94+ 323. 


We can set rg = a, and x3 = ae as free variables. (Here, ay and 
Q2 may take on any values.) Then, the solutions to this system of 
equations have the form 


Ly —2a4 = 303 <2 —3 
rQ| = ay =a, | 1]+agq] 0 
X3 a2 0 1 


This means that the space (A) is spanned by the vectors (—2, 1,0) 
and (—3,0,1). This is a basis for the null space. Consequently, since 
it has two basis vectors, it has a dimension of 2 and it can be put 


into one-to-one correspondence with F? (in a nonobvious way). (See 
Remark A.48.) 
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Definition A.65 (Nullity). The nullity of a matrix A € F”*” is 
the dimension of the null space. 


Theorem A.66 (Rank—nullity theorem). Suppose A € F™*". 
Then, the sum of the rank and nullity of A is n. 


Example A.67. We know that the matrix 


123 
A=[ ial 


is in R?*. We showed that its null space has a dimension of 2, and 
it is obvious that its rank is 1. We see at once that 1+2=<83. 


A.9 Determinant 


Remark A.68. The next definition uses concepts from abstract 
algebra covered in Definition 8.27. The reader who is not familiar 
with permutation groups should review this first. 


Definition A.69 (Determinant). Let A ¢ F"*”". The determi- 
nant of A is 


n 
det(A) = > sgn(o) [[Avaw. (A.17) 
oESn i=1 
Here, o € S,, represents a permutation over the set {1,...,n}, and 
a(i) represents the value to which i is mapped under co. The sign of 
the permutation sgn(c) is given in Definition 8.38. 


Example A.70. Consider an arbitrary 2 x 2 matrix: 
a b 
A= 
Fa 
There are only two permutations in the set S9: the identity permu- 


tation (which is even) and the transposition (1,2) (which is odd). 
Thus, we have 


a b 


det(A) = a al 


| = Aj; A929 = Aj2A91 = ad — be. 


This is the formula usually given in a course on matrix algebra. 
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Proposition A.71. The determinant of any identity matrix is 1. 


Remark A.72. Like many other definitions in mathematics, Def- 
inition A.69 can be useful for proving things but not very useful 
for computing determinants. Most linear algebra textbooks, such as 
Refs. [97, 100,135,136], discuss formulas and algorithms for efficiently 
computing matrix determinants. 


A.10 Eigenvalues and Eigenvectors 


Definition A.73 (Algebraic closure). Let F be a field. The alge- 
braic closure of F, denoted by F, is an extension of F that is (i) also 
a field and (ii) has every possible root to any polynomial with coef- 
ficients drawn from F. 


Remark A.74. A field F is called algebraically closed if F = F. 


Theorem A.75. The algebraic closure of R is C. The field C is 
algebraically closed. 


Definition A.76 (Eigenvalue and (right) eigenvector). Let 
A € F"*". An eigenvalue—eigenvector pair (A, x) is a scalar and nx 1 
a vector such that 


Ax = Ax (A.18) 
and x #0. The eigenvalue may be drawn from F and x from F’. 


Lemma A.77. A value \ € F is an eigenvalue of A € F"*” if and 
only if AI, — A is not invertible. 


Remark A.78. A left eigenvector is defined analogously with 
x’ A = Ax", when x is considered a column vector. We deal exclu- 
sively with right eigenvectors, and hence, when we say “eigenvector,” 
we mean a right eigenvector. 


Definition A.79 (Characteristic polynomial). If A ¢€ F”"*”, 
then its characteristic polynomial is the degree-n polynomial 


det (AI, — A). (A.19) 
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Theorem A.80. A value is an eigenvalue of A € F"*" if and only 
if it satisfies the characteristic equation 


det (AI, — A) = 0. 


That is, X is a root of the characteristic polynomial. 


Remark A.81. We now see why \ may be in F, rather than in F. 
It is possible that the characteristic polynomial of a matrix does not 
have all (or any) of its roots in the field F; the definition of algebraic 
closure ensures that all eigenvalues are contained in the the algebraic 
closure of F. 


Corollary A.82. If A ¢ F"*”, then A and A® share eigenvalues. 


Example A.83. Consider the matrix 


1 0 
A- F | | 
The characteristic polynomial is computed as 
A-1 O 


det (AI, — A) = Who. o-050 


ae 
Thus, the characteristic polynomial for this matrix is 
rN? — 3A 4+ 2. (A.20) 


The roots of this polynomial are Ay = 1 and A» = 2. Using these 
eigenvalues, we can compute eigenvectors as 


—_ al: (A.21) 


= He (A.22) 


and observe that 


ane [Jf] =1f] =a, aan 


and 


a= [9] []=2[)am, aay 
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as required. The computation of eigenvalues and eigenvectors is usu- 
ally accomplished using a computer, for which several algorithms 
have been developed. Those interested readers should consult, for 
example, Ref. [100]. 


Remark A.84. You can use your calculator to return the eigenval- 
ues and eigenvectors of a matrix, as well as several software packages, 
such as MATLAB™ and Mathematica!™. 


Remark A.85. It is important to remember that eigenvectors are 
unique up to scale. That is, if A is a square matrix and (A,x) is an 
eigenvalue—eigenvector pair for A, then so is (A, ax) for a 4 0. This 
is because 


Ax = Ax = A(ax) = A(ax). (A.25) 


Definition A.86 (Algebraic multiplicity of an eigenvalue). 
An eigenvalue has algebraic multiplicity greater than 1 if it is a mul- 
tiple root of the characteristic polynomial. The algebraic multiplicity 
of the root is the multiplicity of the eigenvalue. 


Example A.87. Consider the identity matrix Ip. It has a charac- 
teristic polynomial of (\—1)?, which has one multiple root 1 of mul- 
tiplicity 2. However, this matrix does have two eigenvectors: [1 0]" 
and [0 1)”. 


Example A.88. Consider the matrix 
15 
ce E yr 
The characteristic polynomial is computed as 


A-1 -5 
—2 A-A4 


|= = 1-4) 10= 7-5-6. 


Thus, there are two distinct eigenvalues: 4 = —1 and A = 6, which 
are the two roots of the characteristic polynomial. We can compute 
the two eigenvectors in turn. Consider \ = —1. We solve for 


Pat al fe] = [2 3) fal = le 
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Thus, 
—22%1 = 0x2 = 0. 


We can set x2 = ¢ as a free variable. Consequently, the solution is 


Thus, any eigenvector of \ = —1 is a multiple of the vector (5/2, 1). 
For the eigenvalue \ = —6, we have 


A-1 —-5 | |ti} | 5 —5] jr} _ |0 
From this, we see that 


—221 + 2x2 = 0, 


or £1 = £2. Thus, setting x2 = t, we have the solution 


fal = bd =*Li 


Thus, any eigenvector of \ = 6 is a multiple of the vector (1,1). 


Theorem A.89. Suppose that A € F"*” with eigenvalues \1,...,2n 
all distinct (i.e., 44 A Aj ifi Aj). Then, the corresponding eigenvec- 
tors {V1,.-.,Vn} are linearly independent. 


Definition A.90. Let A ¢€ F”*” with eigenvectors {v1,...,Vvn}. 
Then, the vector space € = span({vi,...,Vn}) is called the 
eigenspace of A. When the eigenvectors are linearly independent, 
they are an eigenbasis for the space they span. 


Remark A.91. It is worth noting that if v; is an eigenvector of A, 
then span(v;) is called the eigenspace associated with v;. 


Definition A.92 (Geometric multiplicity). Each eigenvalue co- 
rresponds to a subset of eigenvectors. The geometric multiplicity is 
the dimension of the eigenspace to which it corresponds. 
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Example A.93. As we have seen, eigenvalue 1 for In corresponds 
to all of R*. Therefore, it has a geometric multiplicity of 2 as well as 
an algebraic multiplicity of 2. 


Remark A.94. It is not the case that the geometric and algebraic 
multiplicities of an eigenvalue must be equal. In general, the alge- 
braic multiplicity of an eigenvalue will be greater than or equal to 
its geometric multiplicity. However, if the algebraic multiplicity of 
an eigenvalue is 1, then this ensures that it will have a geometric 
multiplicity of 1. 


Corollary A.95. The eigenvectors of A € F"*" form a basis for F” 
when the eigenvalues of A are distinct. 


Remark A.96. The following (and final) theorem is crucial to our 
understanding of algebraic graph theory. It is also a useful theorem 
in its own right and has generalizations in complex vector spaces C” 
that have important consequences for quantum mechanics. 


Theorem A.97 (Spectral theorem for real symmetric matri- 
ces). Suppose A € R"*” is a real, symmetric matrix. Then, A has 
real eigenvalues, and the eigenvectors form an orthonormal basis 
for R”. 


Remark A.98. The proof of the spectral theorem is given in almost 
every linear algebra textbook. Lang’s presentation is straightfor- 
ward [101]. The easy part is the proof that the eigenvalues are real. 
To see this, suppose that A’ is a complex eigenvalue with a complex 
eigenvector z. Then, Az = Az. If \ = a+ bi, then A = a — bi. Note 
that \A = a? + b? € R. Furthermore, if \ and y are two complex 
numbers, it’s easy to show that Ay = Afi. Since A is real (and sym- 
metric), we can conclude that 


Az = Az =z. 
Now, (Az)? =z! A = Xz". We have 
Az = Xz => z’ Az = dz" z, 
a ov => g Ag =)z x. 


But then, 
Azz = rz" x. 


This implies that \ = A, which means must be real. 
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The rest of the proof, namely that the eigenvectors form an 
orthogonal basis, is the harder part and is well outside the scope 
of this appendix. 


A.11 Exercises 


Exercise A.1 

Why is Z not a field under ordinary addition and multiplication? Is 
Q, the set of rational numbers, a field under the usual addition and 
multiplication operations? 


Exercise A.2 
Show that (R?*!,+,0) is a group. 


Exercise A.3 
Prove Proposition A.14. 


Exercise A.4 
Let A,B € R™*”. Use the definitions of matrix addition and trans- 
pose to prove that 


(A+B)? =A7+B". (A.26) 


[Hint: If C = A+B, then C;; = Ajj +Bi;, the element in the (i, 7)th 
position of matrix C. This element moves to the (j,i)th position in 
the transpose. The (j,2)th position of A? + B® is Aj, + Bi, but 


AG, = Aj;;. Reason from this point onward.] 


Exercise A.5 

Let A,B € R™*”. Prove by example that AB 4 BA; that is, matrix 
multiplication is not commutative. [Hint: Almost any pair of matrices 
you pick (that can be multiplied) will not commute.] 


Exercise A.6 
Let A € F”™*", and let B € R”"*?. Use the definitions of matrix 
multiplication and transpose to prove that 


(AB)? = B7A? (A.27) 


{[Hint: Use similar reasoning to the hint in Question A.4. But this 
time, note that C,;; = Aj. -B.;, which moves to the (j,7)th position. 
Now, figure out what is in the (j,7)th position of B’A™.] 
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Exercise A.7 
Show that (F"*",+,0) is a group, with 0 being the zero matrix. 


Exercise A.8 
Let A € F”*”". Show that AI, =I,A = A. Hence, I is an identify 
for the matrix multiplication operation on square matrices. 


Exercise A.9 
Prove that if Aj,...,A, ©€ F"*” are invertible, then (Aj,..., 
Big) SA eo AG for iS I, 


Exercise A.10 
Consider the vectors vj = (0,0) and v2 = (1,0). Are these vectors 
linearly independent? Explain why or why not. 


Exercise A.11 
Show that the vectors 


1 4 7 
vy= {2}, vo= {5}, and v3=]8 
3 6 9 


are not linearly independent. [Hint: Following the examples, create 
a system of equations and show that there is a solution not equal 
to 0. 


Exercise A.12 
Why are the vectors 


co on 


1 4 
vij= {2}, voa= {5}, and v3= 
3 6 


not a basis for R?. 


Exercise A.13 
Prove Proposition A.63. 


Exercise A.14 
Prove the Proposition A.71. 


Exercise A.15 
Find the eigenvalues and eigenvectors of the matrix 


a= (13 
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Exercise A.16 
Show that every vector in F? is an eigenvector of Ip. 


Exercise A.17 
Prove Corollary A.82. 


Exercise A.18 
Prove Corollary A.95. 


Appendix B 


A Brief Introduction 
to Probability Theory 


Remark B.1 (Chapter goals). This appendix provides a brief 
(yet theoretical) introduction to probability theory for students who 
have no background in probability. Proofs are omitted for brevity. 


B.1 Probability 


Remark B.2. The proper study of probability theory requires a 
heavy dose of measure theory, which is well beyond the scope of 
a course in graph theory. This appendix is meant to provide a 
somewhat intuitive introduction to probability theory with a small 
amount of mathematical rigor added. This is more than sufficient to 
understand the probability calculations that occur when we discuss 
Markov chains in Chapter 10. Most of the definitions in this chapter 
are motivated by examples from games. We'll begin with an example. 


Example B.3. Suppose you have made it to the very final stage 
of Deal or No Deal. Two suitcases with money remain in play, one 
contains $0.01 while the other contains $1,000,000. The banker has 
offered you a payoff of $499,999. Do you accept the banker’s safe 
offer or do you risk it all to try for $1,000,000. Suppose the banker 
offers you $100, 000, what about $500, 000 or $10, 000? 


Definition B.4 (Outcome). Let 9 be a finite set of elements 
describing the outcome of a chance event (a coin toss, a roll of the 
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dice, etc.). We call 2 the sample space. Each element of 2 is called 
an outcome. 


Example B.5. Congratulations! You have made it to the very final 
stage of Deal or No Deal. Two suitcases with money remain in play, 
one contains $0.01 while the other contains $1,000,000. The banker 
has offered you a payoff of $499,999. Do you accept the banker’s safe 
offer or do you risk it all to try for $1,000,000. Suppose the banker 
offers you $100,000 what about $500,000 or $10,000? To even begin 
to reason about this situation, we note that the world as we care 
about is purely the position of $1,000,000 and $0.01 within the suit- 
cases. In this case, 2 consists of two possible outcomes: $1,000,000 
is in suitcase number 1 (while $0.01 is in suitcase number 2) 
or $1,000,000 is in suitcase number 2 (while $0.01 is in suitcase 
number 1). 

Formally, let us refer to the first outcome as A and the second 
outcome as B. Then, 9 = {A, B}. 


Definition B.6 (Event). If 2 is a sample space, then an event is 
any subset of 2. 


Example B.7. Clearly, the sample space in Example B.3 con- 
sists of precisely four events: @ (the empty event), {A}, {B}, and 
{A, B} =. These four sets represent all possible subsets of the set 
Q = {A, B}. 


Definition B.8 (Union). If E, F C Q are both events, then EU F 
is the union of the sets & and F and consists of all outcomes in either 
FE or F.. Event EU F occurs if either event EF or event F occurs. 


Example B.9. Consider the role of a fair six-sided dice. The out- 
comes are 1, ..., 6. If H = {1,3} and F = {2,4}, thn BUF = 
{1,2,3,4} and will occur as long as we don’t roll a 5 or 6. 


Definition B.10 (Intersection). If E,F C Q are both events, 
then EM F is the intersection of the sets # and F and consists 
of all outcomes in both F and Ff. Event EM F occurs if both even 
FE and event F' occur. 
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Example B.11. Again, consider the role of a fair six-sided dice. The 
outcomes are 1,...,6. If H = {1,2} and F = {2,4}, then ENF = {2} 
and will occur only if we roll a 2. 


Definition B.12 (Mutual exclusivity). Two events E,F C Q 
are said to be mutually exclusive if and only if EN F = 0. 


Definition B.13 (Discrete probability distribution (func- 
tion) ). Given a discrete sample space 2, let F be the set of all events 
on 2. A discrete probability function is a mapping from P : F — (0, 1] 
with the following properties: 


(1) P(Q) =1. 
(2) If E,F € F and ENF =, then P(EUF) = P(E) + P(F). 


Remark B.14 (Power set). In this definition, we talk about the 
set F as the set of all events over a set of outcomes Q. This is an 
example of the power set: the set of all subsets of a set. We sometimes 
denote this set as 2°. Thus, if 0 is a set, then 2 is the power set of 
Q or the set of all subsets of 2. (See Definition 5.54.) 


Remark B.15. Definition B.13 is surprisingly technical and prob- 
ably does not conform to your ordinary sense of what probability 
is. It’s best not to think of probability in this very formal way and 
instead to think that a probability function assigns a number to an 
outcome (or event) that tells you the chances of it occurring. Put 
more simply, suppose we could run an experiment where the result 
of that experiment will be an outcome in Q. The function P simply 
tells us the proportion of times we will observe an event E Cc 2 if we 
ran this experiment an exceedingly large number of times. 


Example B.16. Suppose we could play the Deal or No Deal exam- 
ple over and over again and observe where the money ends up. 
A smart game show would mix the money up so that approximately 
one half of the time we observe $1,000,000 in suitcase 1 and the other 
half the time we observe this money in suitcase 2. 

A probability distribution formalizes this notion and might assign 
1/2 to event {A} and 1/2 to event {B}. However, to obtain a true 
probability distribution, we must also assign probabilities to @ and 
{A, B}. In the former case, we know that something must happen! 
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Therefore, we can assign 0 to event @). In the latter case, we know 
that for certain that either outcome A or outcome B must occur; so, 
in this case, we assign a value of 1. 


Example B.17. In a fair six-sided dice, the probability of rolling 
any value is 1/6. Formally, Q = {1,2,...,6} any role yields is an 
event with only one element, {w}, where w is some value in 2. If we 
consider the event F = {1,2,3}, then P(E) gives us the probability 
that we will roll a 1, 2, or 3. Since {1}, {2}, and {3}, are disjoint sets 
and {1,2,3} = {1} U {2} U {3}, we know that 


1 1 1 
P(F)==++-4+75=- 
) 6 “i 6 6 2 
Definition B.18 (Discrete probability space). The triple (Q, 
F,P) is called a discrete probability space over 2. 


Lemma B.19. Let (Q,F7,P) be a discrete probability space. Then, 
P(@) =0. 


Lemma B.20. Let (0,7, P) be a discrete probability space, and let 
E,F € F. Then, 


P(EUF) = P(E) + P(F) — P(EN F). (B.1) 


Definition B.21 (Set complement). Let 2 be a set of outcomes. 
Let E C Q, and define E* to be the set of elements of 2 not in E. 
This is called the complement of E in Q. 


Lemma B.22. Let (Q,7,P) be a discrete probability space, and let 
E,F € F. Then, 


P(B) = P(ENF) + P(ENF*). (B.2) 


Theorem B.23. Let (0,7, P) be a discrete probability space and let 
EeF. Let F,,..., Fy be any pairwise-disjoint collection of sets that 
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partition Q. That is, assume 


=| A (B.3) 


P(E) =5— P(E ff). (B.4) 


Example B.24. Welcome to Vegas! We’re playing craps. In craps, 
we roll two dice, and winning combinations are determined by the 
sum of the values on the dice. An ideal first craps roll is 7. The sample 
space 2, in which we are interested, has 36 elements, one each for 
the possible values the dice will show (the related set of sums can be 
easily obtained). 

Suppose that the dice are colored blue and red (so they can be 
distinguished), and let’s call the blue die number one and the red 
die number two. Let’s suppose we are interested in the event that 
we roll a 1 on die number one and that the pair of values obtained 
sums to 7. There is only one way this can occur, namely, we roll a 1 
on die number one and a 6 on die number two. Thus, the probability 
of this occurring is 1/36. In this case, event E is the event that we 
roll a 7 in our craps game and event F} is the event that die number 
one shows a 1. We could also consider event Fh that die number one 
shows a 2. By similar reasoning, we know that the probability of both 
E and F» occurring is 1/36. In fact, if F; is the event that one of the 


dice shows a value of 7 (i = 1,...,6), then we know that 
P(ENR) == 
ae: 8 
Clearly, the events F; (i = 1,...,6) are pairwise disjoint (you can’t 


have both a 1 and a 2 on the same die). Furthermore, 2 = Fi U F)U 
---U Fg. (After all, some number has to appear on die number one!) 
Thus, we can compute 
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B.2. Random Variables and Expected Values 


Remark B.25. The concept of a random variable can be made 
extremely mathematically specific. A good intuitive understanding of 
a random variable is a variable X whose value is not known a priori 
and which is determined according to some probability distribution 
P that is a part of a probability space (Q,F, P). 


Example B.26. Suppose that we consider flipping a fair coin. Then, 
the probability of seeing heads (or tails) should be 1/2. If we let X 
be a random variable that provides the outcome of the flip, then it 
will take on values heads or tails, and it will take each value exactly 
50% of the time. 


Remark B.27. The problem with allowing a random variable to 
take on arbitrary values (such as heads or tails) is that it makes 
it difficult to use random variables in formulae involving numbers. 
There is a very technical definition of random variable that arises 
in formal probability theory. However, it is well beyond the scope of 
what we need. We can, however, get a flavor for this definition in the 
following restricted form. 


Definition B.28 (Random variable). Let (0,7, P) bea discrete 
probability space. Let D C R be a finite discrete subset of real num- 
bers. A random variable X is a function that maps each element of 
Q to an element of D. Formally, X :Q— D. 


Remark B.29. Clearly, if S C D, then X~1(S) = {w € Q|X(w) € 
S} € F. We can think of the probability of X taking on a value in 
S C Dis precisely P(X~!(S)). 

Using this observation, if (0,7, P) is a discrete probability dis- 
tribution function and X : 2 > D is a random variable and x € D, 
then let P(x) = P(X~!({a}). That is, the probability of X taking a 
value of x is the probability of the element in 2 corresponding to «x. 


Example B.30. Consider our coin-flipping random variable. Instead 
of having X take values heads or tails, we can instead let X take on 
values 1 if the coin comes up heads and 0 if the coin comes up tails. 
Thus, if Q = {heads, tails}, then X (heads) = 1 and X (tails) = 0. 
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Example B.31. When 2 (in probability space (0,7, P)) is already 
a subset of IR, then defining random variables is very easy. The ran- 
dom variable can just be the obvious mapping from 2) into itself. For 
example, if we consider rolling a fair die, then Q = {1,...,6} and 
any random variable defined on (Q, F, P) will take on values 1,...,6. 


Definition B.32. Let (Q,7,P) be a discrete probability distribu- 
tion, and let X :Q— D bea random variable. Then, the expected 
value of X is 


E(X) = >" 2P(2). (B.5) 


xzeED 


Example B.33. Let’s play a die-rolling game. You put up your own 
money. Even numbers lose $10 times the number rolled, while odd 
numbers win $12 times the number rolled. What is the expected 
amount of money you'll win in this game? 

Let 2 = {1,...,6}. Then, D = {12, —20, 36, —40, 60, —60}: these 
are the dollar values you will win for various rolls of the dice. Then, 
the expected value of X is 


0) =1(8) +298) +%(9 
+ (—40) (=) + 60 (=) + (—60) (=) =-2.  (B.6) 


Would you still want to play this game considering the expected 
payoff is —$2? 


B.3 Conditional Probability 


Remark B.34. Suppose we are given a discrete probability space 
(Q,F, P) and we are told that an event F has occurred. We now wish 
to compute the probability that some other event F’ has occurred. 
This value is called the conditional probability of event F’ given event 
E and is written as P(F|E). 


Example B.35. Suppose we roll a fair six-sided die twice. The sam- 
ple space in this case is the set OQ = {(z,y)|x = 1,...,6, y = 
1,...,6}. Suppose I roll a 2 on the first try. I want to know what 
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the probability of rolling a combined score of 8 is. That is, given 
that I’ve rolled a 2, I wish to determine the conditional probability 
of rolling a 6. 

Since the die is fair, the probability of rolling any pair of values 
(x,y) € Q is equally likely. There are 36 elements in Q, so each is 
assigned a probability of 1/36. That is, (0,47, P) is defined so that 
P((a,y)) = 1/36 for each (x,y) € Q. 

Let EF be the event that we roll a 2 on the first try. We wish to 
assign a new set of probabilities to the elements of 2 to reflect this 
information. We know that our final outcome must have the form 
(2,y), where y € {1,...,6}. In essence, # becomes our new sample 
space. Furthermore, we know that each of these outcomes is equally 
likely because the die is fair. Thus, we may assign P((2,y)|E) = 1/6 
for each y € {1,...,6} and P((z,y)|E) = 0 just in case x # 2, so 
(x,y) ¢ E. This last definition occurs because we know that we’ve 
already observed a 2 on the first roll, so it’s impossible to see another 
first number not equal to 2. 

At last, we can answer the question we originally posed. The only 
way to obtain a sum equal to 8 is to roll a six on the second attempt. 
Thus, the probability of rolling a combined score of 8 given a 2 on 
the first roll is 1/6. 


Lemma B.36. Let (0,7, P) be a discrete probability space and sup- 
pose that event E CQ. Then, (E,Fr, Pr) is a discrete probability 
space when 


PoP =a (B.7) 


for all F C E and Pg(w) =0 for anyw ¢ E. 


Remark B.37. The previous lemma gives us a direct way to con- 
struct P(F'|E) for arbitrary F C 2. Clearly, if F C E, then 


P(F) 
P(F\E) = Pe(F) = ——. 
Now, suppose that Fis not a subset of F but that FN E 4 0. Then, 
clearly, the only possible events that can occur in F’, given that EF has 
occurred, are the ones that are also in FE. Thus, Pp(F’) = Pe( ENF). 
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More to the point, we have 


P(FNE) 


P(F\E) = Pp(F NE) = PUB) 


(B.8) 
Definition B.38 (Conditional probability). Given a discrete 
probability space (Q,7,P) and an event E € Ff, the conditional 
probability of event F' © F given event E is 


P(F NE) 


P(FIE) = pap 


(B.9) 
Example B.39 (Simple blackjack). Blackjack is a game in which 
decisions can be made entirely based on conditional probabilities. 
The chances of a card appearing are based entirely on whether or 
not you have seen that card already since cards are discarded as the 
dealer works her way through the deck. 

Consider a simple game of blackjack played with only the cards 
A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, and K. In this game, the dealer deals 
two cards to the player and two to herself. The objective is to obtain 
a score as close to 21 as possible without going over. Face cards are 
worth 10, A is worth 1 or 11, and all other cards are worth their 
respective face value. We’ll assume that the dealer must hit (take a 
new card) on 16 and below and will stand on 17 and above. 

The complete sample space in this case is very complex; it consists 
of all possible valid hands that could be dealt over the course of a 
standard play of the game. We can however consider a simplified 
sample space of hands after the initial deal. In this case, the sample 
space has the form 


Q = {((x,y), (,t))}. 


Here, x, y, s, and ¢ are cards without repeats. The total size of the 
sample space is 


13 x 12 x 11 x 10 = 17,160. 


This can be seen by noting that the player can receive any of the 
13 cards as the first card and any of the remaining 12 cards for the 
second card. The dealer then receives one of the 11 remaining cards 
and then one of the 11 remaining cards. 
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Let’s suppose that the player is dealt 10 and 6 for a score of 16, 
while the dealer receives a 4 and 5 for a total of 9. If we suppose that 
the player decides to hit, then the large sample space (QQ) becomes 


Q = {((x,y, z), (s,t))}, 


which has a size of 
13 x 12 x 11 x 10 x 9 = 154,440, 


while the event is 


EB = {((10, 6,2), (4,5))}. 


There are nine possible values for z, and thus, P(F’) = 9/154,440. 
Let us now consider the probability of busting on our first hit. 
This is event F’ and is given as 


F={((z,y,z),(s,t)):c+y+z> 21}. 


(Here, we take some liberty by assuming that we can add card values 
like digits.) 
The set F is very complex, but we can see immediately that 


ENF = {((10,6, z), (4,5)) : z € {7,8,9, J,Q, K}} 


because these are the hands that will cause us to bust. Thus, we can 
easily compute 
P(ENF)  6/154,440 6 2 

ey P(E) — 9/154,440° «93° os 
Thus, the probability of not busting given the hand we have drawn 
must be 1/3. We can see at once that our odds when taking a hit 
are not very good. Depending on the probabilities associated with 
the dealer busting, it may be smarter for us to not take a hit and 
see what happens to the dealer; however, in order to be sure we’d 
have to work out the chances of the dealer busting (since we know 
she will continue to hit until she busts or exceeds our value of 16). 
This computation is quite tedious, so we will not include it here. 
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Remark B.40. The complexity associated with blackjack makes 
knowing exact probabilities difficult, if not impossible. Thus, most 
card-counting strategies use heuristics to attempt to understand 
approximately what the probabilities are for winning given the his- 
tory of observed hands. To do this, simple numeric values are assigned 
to cards: generally a +1 to cards with low values (2, 3, 4, etc.), a 0 
to cards with mid-range values (7, 8, and 9), and negative values for 
face cards (10, J, Q, and kK’). As the count gets high, there are more 
face cards in the deck, and thus, the chances of the dealer busting or 
the player drawing blackjack increase. If the count is low, there are 
fewer face cards in the deck, and the chance of the dealer drawing 
a sufficient number of cards without busting is higher. Thus, players 
favor tables with high counts. 


Definition B.41 (Independence). Let (9,7,P) be a discrete 
probability space. Two events E,F © F are called independent if 
P(E|F) = P(E) and P(F|E) = P(F). 


Theorem B.42. Let (Q,F7,P) be a discrete probability space. If 
E,F € F are independent events, then P(E N F) = P(E)P(F). 


Example B.43. Consider rolling a fair die twice in a row. Let 
be the sample space of pairs of die results that will occur. Thus, 
Q = {(z,y)|z =1,...,6, y=1,...,6}. Let E be the event that says 
we obtain a 6 on the first roll. Then, FE = {(6,y) : y = 1,...,6}, 
and let F’ be the event that says we obtain a 6 on the second roll. 
Then, F = {(z,6) : ¢ = 1,...,6}. Obviously, these two events are 
independent. The first roll cannot affect the outcome of the second 
roll, thus P(F|E) = P(F). We know that P(E£) = P(F) = 1/6. That 
is, there is a one in six chance of observing a 6. Thus, the chance of 
rolling double sixes in two rolls is precisely the probability of both 
events — and F occurring. Using our result on independent events, 
we can see that P(E M F) = P(E)P(F) = (1/6)? = 1/36, just as we 
expect it to be. 


Remark B.44. The previous result will help in understanding 
Theorem 10.16. 
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B.4_ Exercises 


Exercise B.1 

A fair four-sided die is rolled. Assume the sample space of interest 
to be the number appearing on the die and the numbers run from 
1 to 4. Identify the space 2 precisely and all the possible outcomes 
and events within the space. What is the (logical) fair probability 
distribution in this case. [Hint: See Example B.17.] 


Exercise B.2 

Prove the following: Let & C Q, and define E* to be the set of 
elements of Q not in E (this is called the complement of E’). Suppose 
(Q,F,P) is a discrete probability space. Show that P(E°) = 1 — 
P(E). 


Exercise B.3 
Prove Lemma B.22. [Hint: Show that EN F and EM F*® are mutually 
exclusive events. Then, show that E = (EN F)U(ENF*),] 


Exercise B.4 

Suppose that I change the definition of F; in Example B.24 to read: 
value i appears on either die, while keeping the definition of event E 
the same. Do we still have 


6 
P(E) =5— P(EN BR)? 
a=] 


If so, show the computation. If not, explain why. 


Exercise B.5 

Use Definition B.38 to compute the probability of obtaining a sum of 
8 in two rolls of a die given that in the first roll, a 1 or 2 appears. [Hint: 
The space of outcomes is still Q = {(z, y)|z =1,...,6, y=1,..., 6}. 
First, identify the event E within this space. How many elements 
within this set will enable you to obtain an 8 in two rolls? This is the 
set EMF. What is the probability of EN F? What is the probability 
of E? Use the formula in Definition B.38. It might help to write out 
the space 22.] 
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standard basis, 240 
standard form, see linear 
programming, standard form 
star graph, see graph, star 
state probability vector, 175 
stationary probability vector, 
175 
stochastic matrix, see matrix, 
stochastic 
strong duality theorem, see linear 
programming, strong duality 
theorem 
strongly connected graph, see graph, 
strongly connected 
sub-walk, see walk, sub-walk 
subgraph, 22 
edge-induced, 22 
spanning, 22 
vertex-induced, 23 
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subgraph isomorphism problem, see 
problem, subgraph isomorphism 

subgroup, see group, subgroup 

surjective mapping, see mapping, 
surjective 

surplus variables, see linear 
programming, surplus variables 

Sylvester, J. J., 165 

symmetric group, see group, 
symmetric group 

symmetric matrix, see matrix, 
symmetric 

Szegd, Gaor, 165 


T 


tour, 32 
Eulerian, 32 
trail, 32 
Eulerian, 32 
transpose, see matrix, transpose 
transposition, 151 
tree, 52 
breadth-first search, 70 
characterization theorem, 56 
depth-first search, 71 
leaf, 55 
search, 69 
spanning, 52, 72 
tree-graphic degree sequence, see 
degree sequence, tree graphic 
trivial graph, see graph, trivial 
trivial walk, see walk, trivial 


U 


underlying graph, see graph, 
underlying 

union, 254 

unit vector, 241 


vector space, 233 

vertex, 3 
adjacent, 4 
coloring, 125 
cover, 26 
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eccentricity, 36 
isolated, 16 
reachable, 37 
supply and demand, 104 
vertex cut, 39 
vertex deletion graph, 39 
vertex-induced subgraph, see 
subgraph, vertex-induced 
von Neumann, John, 
212 


WwW 


walk, 31 
closed, 31 
length, 32 
sub-walk, 31 
trivial, 31 
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weighted graph, see graph, weighted 
weighted hereditary system, see 

hereditary system, weighted 
Whitney, Hassler, 99 


Z 


Zachary’s karate club, 46 
zero matrix, see matrix, zero 


